WO2022022670A1 - Neural network computation graph processing method and apparatus, and processing device - Google Patents

Neural network computation graph processing method and apparatus, and processing device Download PDF

Info

Publication number
WO2022022670A1
WO2022022670A1 PCT/CN2021/109499 CN2021109499W WO2022022670A1 WO 2022022670 A1 WO2022022670 A1 WO 2022022670A1 CN 2021109499 W CN2021109499 W CN 2021109499W WO 2022022670 A1 WO2022022670 A1 WO 2022022670A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
calculation
graph
processor
neural network
Prior art date
Application number
PCT/CN2021/109499
Other languages
French (fr)
Chinese (zh)
Inventor
戚海涛
李涵
吴欣洋
丁瑞强
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2022022670A1 publication Critical patent/WO2022022670A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present invention relates to the technical field of network communication, in particular to a method for processing a neural network computation graph, a processing device, a processing device and a readable storage medium.
  • the neural network computation graph consists of a variety of operators, which can be regarded as operational functions, such as convolution and pooling.
  • CPU Central Processing Unit
  • APU Accelerated Processing Unit
  • the present invention provides a method for processing a neural network computation graph, a processing device, a processing device and a readable storage medium.
  • a first aspect provides a method for processing a neural network computation graph, the processing method comprising: determining a processor type corresponding to each operator according to attribute information of each operator in the neural network computation graph; The dependency relationship between the processor type corresponding to the sub-graph and a plurality of the operators, the neural network calculation graph is divided into one or more calculation sub-graphs, and all operators in each of the calculation sub-graphs correspond to a A processor; the calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.
  • a processing device comprising: a determination module for determining a processor type corresponding to each operator according to attribute information of each operator in a neural network calculation graph; a division module for The neural network computation graph is divided into one or more computation subgraphs according to the processor type corresponding to each operator and the dependencies among the plurality of operators, each of the computation subgraphs corresponds to a A processor; a processing module, configured to process the calculation sub-graph through the processor corresponding to the calculation sub-graph.
  • a processing device comprising: a processor, a memory, and a program stored on the memory and executable on the processor, the program when executed by the processor implements the following The steps of the method for processing a neural network computation graph described in the first aspect.
  • a readable storage medium where a program is stored on the readable storage medium, and when the program is executed by a processor, the steps of the method for processing a neural network computation graph according to the first aspect are implemented.
  • the processor type corresponding to the operator is determined according to the attribute information of the operator, and the processing corresponding to the operator is determined according to the attribute information of the operator.
  • the dependency relationship between the operator type and multiple operators the neural network calculation graph is divided into one or more calculation subgraphs, each calculation subgraph corresponds to a kind of processor, and the processor corresponding to the calculation subgraph is used for the calculation.
  • the sub-graph is processed, so as to realize the effective use of computing resources, improve the processing speed of the neural network computing graph, and effectively solve the problem of neural network computing due to the inability of certain types of processors to process certain operators or the low processing efficiency.
  • the problem of slow processing of graphs is referred to the attribute information of the operator.
  • FIG. 1 is a flowchart of a method for processing a neural network computation graph according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a specific implementation manner of dividing a neural network computation graph into one or more computation subgraphs.
  • FIG. 3 is a schematic diagram of a neural network computation graph.
  • Fig. 4 is a flow chart of a specific implementation manner of processing a calculation subgraph by a processor corresponding to the calculation subgraph.
  • FIG. 5 is a block diagram of a processing apparatus provided by an embodiment of the present invention.
  • FIG. 6 is a block diagram of the composition of a processing device according to an embodiment of the present invention.
  • words such as “exemplary” or “for example” are used to mean serving as an example, illustration or illustration. Any embodiments or designs described as “exemplary” or “such as” in the embodiments of the present invention should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present the related concepts in a specific manner.
  • FIG. 1 is a flowchart of a method for processing a neural network computation graph according to an embodiment of the present invention.
  • an embodiment of the present invention provides a method for processing a neural network computation graph.
  • the specific steps include: step 101 , step 102 , step 103 and step 104 .
  • Step 101 Obtain attribute information of all operators in the neural network computational graph (Computational Graph).
  • a neural network can be a computing network composed of operators (Operator, OP).
  • the neural network computing graph can be composed of multiple operators, and operators can be regarded as operation functions, such as convolution and pooling. (Pooling), full connection, etc.
  • the neural network is a tree-structured neural network model. It can be understood that the embodiment of the present invention does not limit the specific implementation form of the neural network model.
  • each operator has a different operation process and also has one or more attributes
  • the attribute information of the operator may include a combination of one or more of the following: calculation amount, operator function, and operator parameter.
  • Step 102 Determine the processor type corresponding to each operator according to the attribute information of each operator in the neural network calculation graph.
  • the type (type) of the processor may include a central processing unit (Central Processing Unit, CPU), an accelerated processor (Accelerated Processing Unit, APU), a graphics processor ( Graphics Processing Unit, GPU), tensor processor (Tensor Processing Unit, TPU), etc.
  • the types of processors in the embodiments of the present invention are not limited to this, and may also include other types of processors.
  • the calculation amount, operator function, and operator parameters of operators corresponding to various types of processors may be predetermined, and the calculation amount, operator function, operator parameter and processor type of the operator may be established. corresponding relationship, then in step 102, the processor type corresponding to each operator of the neural network calculation graph can be obtained from the predetermined corresponding relationship, and the processor type corresponding to the operator refers to the type of processor used to process the operator The type of processor.
  • the neural network calculation graph includes operator 1 and operator 2, wherein the attribute information of operator 1 includes one or more of the following combinations: calculation amount A1, operator function A2 and operator parameter A3, according to the above corresponding relationship,
  • the processor type corresponding to the calculation amount A1, the operator function A2, and the operator parameter A3 is the central processing unit, then the processor type corresponding to the operator 1 is the central processing unit;
  • the attribute information of the operator 2 includes one or more of the following Combination: calculation amount B1, operator function B2 and operator parameter B3, according to the above correspondence, the processor type corresponding to calculation amount B1, operator function B2 and operator parameter B3 is an accelerated processor, then the corresponding processor type of operator 2
  • the processor type is an accelerated processor.
  • Step 103 Divide the neural network calculation graph into one or more calculation subgraphs according to the processor type corresponding to each operator and the dependencies between multiple operators, and all operators in each calculation subgraph correspond to a kind of processor.
  • each of the divided calculation subgraphs may correspond to a type of processor.
  • the neural network calculation graph is divided into calculation subgraph 1 and calculation subgraph 2, wherein, in the calculation subgraph 1 All operators correspond to CPU, that is, all operators in calculation subgraph 1 can be processed by CPU, and all operators in calculation subgraph 2 correspond to APU, that is, all operators in calculation subgraph 2 can be processed by APU.
  • the number of calculation subgraphs into which the calculation graph is divided is not limited.
  • Step 104 Process the calculation subgraph by using a processor corresponding to the calculation subgraph.
  • the corresponding calculation subgraph is compiled by the processor corresponding to the calculation subgraph, or the corresponding calculation subgraph is compiled and executed by the processor corresponding to the calculation subgraph.
  • all operators in the calculation subgraph have the same processor type, that is, for each calculation subgraph, all operators in the calculation subgraph are performed by processors of the corresponding processor type. deal with.
  • all operators in each calculation subgraph correspond to one processor of one processor type, that is, each calculation subgraph is processed by one processor of the corresponding processor type, so that it can effectively Improve the processing efficiency of computational graphs.
  • the multiple calculation subgraphs may correspond to the same processor of one processor type, that is, the corresponding processor
  • the same processor of the processor type processes the multiple computation subgraphs. For example, if the processor type corresponding to the calculation subgraph 1 is a central processing unit, and the processor type corresponding to the calculation subgraph 2 is a central processing unit, the calculation subgraph 1 and the calculation subgraph 2 can be processed by the same central processing unit.
  • the multiple calculation subgraphs may correspond to different processors of the same processor type, that is, by multiple processors of the corresponding processor type
  • the plurality of computational subgraphs are processed. For example, if the processor type corresponding to the calculation sub-graph 1 is a central processing unit, and the processor type corresponding to the calculation sub-graph 2 is a central processing unit, the calculation sub-graph 1 can be processed by the same central processing unit 1, and the calculation sub-graph 2 It can be processed by the same central processing unit 2, and the central processing unit 1 and the central processing unit 2 are different central processing units under the same processor type.
  • the processor type corresponding to the operator can be determined according to the attribute information of the operator, and according to the processor type corresponding to the operator and the dependencies between multiple operators, the neural network calculation graph is divided into: One or more calculation subgraphs, each calculation subgraph corresponds to a processor, and the calculation subgraph is processed by the processor corresponding to the calculation subgraph, thereby realizing the effective use of computing resources and improving the neural network calculation.
  • the processing speed of the graph can effectively solve the problem that the processing speed of the neural network calculation graph is slow due to the inability of certain types of processors to process certain operators or the low processing efficiency.
  • FIG. 2 is a flowchart of a specific implementation manner of dividing a neural network calculation graph into one or more calculation subgraphs.
  • the specific steps of step 103 include steps 201 to 203 .
  • Step 201 Traverse each computing branch of the neural network computing graph along the first direction, where the computing branch includes a plurality of nodes, and each node corresponds to an operator.
  • the neural network computation graph may include multiple computation branches, each computation branch may include multiple nodes, and each node corresponds to an operator in the computation graph.
  • Step 202 For the current computing branch, traverse each node on the current computing branch along the second direction, and set the first direction and the second direction to intersect.
  • the second direction is the reverse direction of the processing order of each operator in the calculation diagram.
  • the processing order of each operator in the calculation diagram is the processing order from top to bottom, then the reverse direction is the order from bottom to top; the corresponding , the first direction may be a left-to-right direction or a right-to-left direction.
  • Step 203 Based on the traversal order, according to the processor type corresponding to each node, the dependency relationship between multiple nodes, and the preset division rule, divide the neural network computation graph into one or more computation subgraphs.
  • the dependencies between the multiple nodes may include: the hierarchical relationship of each operator in the neural network and/or the processing order of each operator.
  • the dependency relationship may include: parent-child node relationship, sibling node relationship, and the like.
  • parent nodes have children. Child nodes of the same level are called sibling (brother or sister) nodes.
  • the top node is called the root node. Except for the root node, every other node has a parent node.
  • the processing order of the above-mentioned nodes refers to the compilation order of each operator in the neural network computation graph, or the compilation and execution order.
  • the traversal order is the traversal order determined according to step 201 and step 202, that is, the order of traversing each computation branch of the computation graph along the first direction and the order of traversing each node of the current computation branch along the second direction, Based on the traversal order, each computing branch and each node of each computing branch are traversed in sequence to determine the computing subgraph to which each node belongs, thereby dividing one or more computing subgraphs.
  • the preset division rules may include: a first division rule.
  • the first division rule for the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located; If the corresponding processor types are different, the current node and the previous node are divided into different calculation subgraphs.
  • the current node is the first node of the first calculation branch in the traversal sequence, in the traversal sequence, the current node does not have the previous node, and the current node is directly divided into a calculation subgraph , and traverse the next node. If the current node is not the first node of the first calculation branch in the traversal order, then according to the traversal order, the dependency between the current node and the previous node of the current node is the parent-child relationship, the current node is the parent node, The previous node is a child node.
  • the nodes included in different calculation subgraphs are different, that is, the same nodes do not exist between different calculation subgraphs.
  • the preset division rule may further include: a second division rule.
  • Second division rule If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then the next node of the current node is not traversed node, and traverse each node of the next calculation branch; if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, Then continue to traverse the next node in the current calculation branch.
  • the dependency relationship between the current node and the next node of the current node is a parent-child node relationship
  • the next node is a parent node
  • the current node is a child node.
  • the preset division rule may further include: a third division rule.
  • the third division rule if the computing subgraph where the current node is located has multiple parent nodes in the neural network computing graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node, Then it is determined that any parent node and the current node belong to different computing subgraphs.
  • the neural network computation graph may be divided into one or more computation subgraphs based on the foregoing division rules.
  • the neural network computation graph can be divided into a computation subgraph, that is, Say, the computation subgraph is the neural network computation graph.
  • the neural network computation graph includes multiple branches, and the operator of the parent node and the operator of the child node on each branch have different processor types. Based on the above division rules, the neural network computation graph can be divided into Multiple computational subgraphs.
  • Fig. 3 is a schematic diagram of a neural network computation graph.
  • the neural network computation graph includes nodes N1 to N9, and the processing order of each node in the neural network computation graph is from top to bottom.
  • the dependency relationship is shown in FIG. 3 , and the second direction is the direction from bottom to top and the first direction is the direction from left to right as an example to illustrate and explain the preset division rules in the embodiment of the present invention. .
  • the processor type corresponding to the operator of node N7 is CPU
  • the processor type corresponding to the operators (N1 to N6, and N8 and N9) of other nodes is determined
  • APU all operators in the network are divided below based on the processor type corresponding to each operator, the dependencies between multiple operators, and the preset division rules.
  • the first calculation branch is the leftmost branch, that is, the branch formed by nodes N6, N5, N4, N3, and N2.
  • the first node of this branch in the traversal order is N6, Since N6 is the first node of the first calculation branch, the node N6 is divided into a calculation subgraph and continues to traverse the next node N5.
  • the node N5 is the parent node of the node N6, and the processing corresponding to the node N5 and the node N6
  • the types of the devices are the same.
  • the node N5 and the node N6 are divided into the same calculation subgraph, which is defined as the calculation subgraph A.
  • the node N4 is the parent node of N5, and the processor types corresponding to N4 and N5 are the same. According to the first division rule, the node N4 is divided to the calculation subgraph A.
  • the node N3 is the parent node of the node N4, and the processor type corresponding to the node N3 is the same as the processor type corresponding to the calculation subgraph A, According to the first division rule, the node N3 is divided into the calculation subgraph A.
  • the parent node of the node N3 of the first calculation branch is the node N2, the node N2 has multiple child nodes (nodes N3, N7, N9), and the node N3 is not the last child of the node N2 in the first direction node, the last child node of node N2 along the first direction is node N9, so according to the second division rule, for the first calculation branch, node N2 is not traversed, but each node of the next calculation branch is traversed, namely Traverse the branch formed by node N6, node N8, node N7, and node N2.
  • node N8 In the branch formed by node N6, node N8, node N7, and node N2, since node N6 has been traversed before, there is no need to repeat the traversal. According to the traversal order, node N8 is traversed.
  • the processor type corresponding to node N8 is the same as that of node N6.
  • the processor types are the same, that is, the processor types corresponding to the node N8 and the calculation subgraph A are the same, so the node N8 and the calculation subgraph A are divided according to the first division rule.
  • the traversal order continue to traverse the next node N7 of the node N8.
  • the processor type corresponding to the node N7 is different from the processor type corresponding to the node N8.
  • the node N7 is separately divided into a calculation subgraph, and define for the calculation of subgraph B.
  • the parent node of node N7 namely node N2
  • node N7 is not the last child node of node N2 along the first direction.
  • no traversal is performed. node N2, and continue to traverse each node of the next computing branch, that is, the branch formed by nodes N6, N8, N9, N2, and N1.
  • node N9 In the branch formed by node N6, node N8, node N9, node N2, and node N1, since node N6 and node N8 have been traversed before, there is no need to repeat the traversal. According to the traversal order, node N9 is traversed. Since the calculation subgraph A has The three parent nodes are node N2, node N7, and node N9. There are different types of processors corresponding to node N7 and calculation subgraph A. The processor type corresponding to node N7 is CPU, while the processor corresponding to calculation subgraph A The type is APU. According to the third division rule, any one of node N2, node N7, and node N9 should belong to different calculation subgraphs from calculation subgraph A.
  • processor type corresponding to node N9 is different from calculation subgraph A
  • the corresponding processor types are the same, but N9 is not divided into the calculation subgraph A, and the node N9 is separately divided into a calculation subgraph, which is defined as the calculation subgraph C.
  • the parent node of node N9 namely node N2
  • node N9 is the last child node of node N2 along the first direction. Therefore, according to the second division rule, traverse For the node N2, since the processor types corresponding to the node N2 and the node N9 are the same, the node N2 is divided into the calculation subgraph C according to the first division rule.
  • the root node N1 and the node N2 have the same processor type, and the node N1 is divided into the calculation subgraph C according to the first division rule.
  • calculation subgraph A the calculation subgraph B, and the calculation subgraph C are divided.
  • the calculation subgraph A and the calculation subgraph C have the same processor type, both of which are APUs, and the calculation subgraph B corresponds to the processor type. CPU.
  • computational subgraphs A and C may be processed by the same APU, while computational subgraph B may be processed by the CPU.
  • computational subgraph A may be processed by one APU
  • computational subgraph C may be processed by another APU
  • computational subgraph B may be processed by the CPU.
  • each calculation subgraph corresponds to a processor
  • each The processor type corresponding to each calculation subgraph is the processor type determined according to the attribute information corresponding to the operators in the calculation subgraph, which realizes the effective utilization of computing resources, improves the processing speed of the neural network calculation graph, and can effectively Solve the problem that the processing speed of the neural network calculation graph is slow due to the inability of certain types of processors to process certain operators or the low processing efficiency.
  • FIG. 4 is a flowchart of a specific implementation manner of processing a calculation subgraph by a processor corresponding to the calculation subgraph.
  • the calculation subgraph is processed by a processor corresponding to the calculation subgraph.
  • the step of processing the graph, that is, step 104 may further include steps 401 to 403 .
  • Step 401 According to the dependencies among the multiple operators in the neural network computation graph, determine the dependencies between the respective computation subgraphs.
  • Step 402 Obtain the processing sequence of each calculation subgraph according to the dependency relationship between each calculation subgraph.
  • the above-mentioned dependencies between the calculation subgraphs are used to represent the hierarchical relationship or processing order of the calculation subgraphs in the calculation graph. Based on the dependencies between the calculation subgraphs, the compilation sequence or the compilation and execution sequence of each calculation subgraph can be determined.
  • the calculation sub-graph A, the calculation sub-graph B and the calculation sub-graph C are divided according to the above-mentioned division method, wherein according to the calculation sub-graph C, the calculation sub-graph B, and the calculation sub-graph A between the operators It can be seen that the output of the calculation subgraph C is connected to the node N7 of the calculation subgraph B, and the nodes N3 and N8 of the calculation subgraph A, and the output of the calculation subgraph B is connected to the node N8 of the calculation subgraph A.
  • Step 403 According to the processing order of each calculation sub-graph, the calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.
  • multiple operators are divided into one or more calculation subgraphs according to their corresponding processor types and dependencies between operators, and each calculation
  • the sub-graph is processed, which can facilitate the configuration of the input and output relationship between the processors, ensure the accuracy of the operator operation in the neural network, and improve the efficiency of the operation processing.
  • FIG. 5 is a block diagram of a processing apparatus provided by an embodiment of the present invention.
  • an embodiment of the present invention provides an apparatus 500 for processing a neural network computation graph.
  • the processing apparatus 500 is used to process a neural network computation graph.
  • the apparatus 500 includes: an acquisition module 501 , a determination module 502 , and a division module 503 and processing module 504 .
  • the obtaining module 501 is used for obtaining attribute information of all operators in the neural network calculation graph.
  • the attribute information of the operator includes a combination of one or more of the following: calculation amount, operator function and operator parameter.
  • the determining module 502 is configured to determine the processor type corresponding to each operator according to the attribute information of each operator in the neural network calculation graph.
  • the division module 503 is used to divide the neural network calculation graph into one or more calculation subgraphs according to the processor type corresponding to each operator and the dependencies between multiple operators, and each calculation subgraph corresponds to a processor .
  • the processing module 504 is configured to process the corresponding calculation subgraph through the processor corresponding to the calculation subgraph.
  • all operators in the computation subgraph correspond to the same processor type. In some embodiments, a plurality of the computing subgraphs corresponding to the same processor type correspond to the same processor.
  • the dividing module 503 is further configured to: traverse each computation branch of the neural network computation graph along the first direction, where the computation branch includes a plurality of nodes, and each node corresponds to an operator; for the current computation branch, Traverse each node on the current computing branch along the second direction, and set the first direction and the second direction to intersect; The partitioning rule divides the neural network computation graph into one or more computation subgraphs.
  • the preset division rule includes: the preset division rule includes: a first division rule.
  • the first division rule includes: for the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located; If the processor types corresponding to the nodes are different, the current node and the previous node are divided into different computing subgraphs; wherein, the same nodes do not exist between different computing subgraphs.
  • the preset division rule further includes: a second division rule.
  • the second division rule includes: if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then traverse the next calculation branch Each node of the current node; if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, continue to traverse the nodes in the current calculation branch the next node.
  • the preset division rule further includes: a third division rule.
  • the third division rule includes: if the calculation subgraph where the current node is located has multiple parent nodes in the neural network calculation graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node , then it is determined that any parent node and the current node belong to different computational subgraphs.
  • the processing module 504 is further configured to: determine the dependency between the respective calculation subgraphs according to the dependency between the multiple operators in the neural network calculation graph; According to the dependency relationship, the processing order of each calculation subgraph is obtained; according to the processing sequence, the corresponding calculation subgraph is processed by the processor corresponding to the calculation subgraph.
  • the processing apparatus 500 for processing a neural network computation graph provided by the embodiment of the present invention can implement the various steps and processes of the above-mentioned processing method, and can achieve the same technical effect.
  • FIG. 6 is a block diagram of the composition of a processing device according to an embodiment of the present invention.
  • an embodiment of the present invention further provides a processing device 600, the processing device 600 includes a processor 601, a memory 602, and a program or instruction stored in the memory 602 and executable on the processor 601, the When the program or instruction is executed by the processor 601, each step and process of the above-mentioned processing method is implemented, and the same technical effect can be achieved.
  • An embodiment of the present invention further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, each step and process of the above processing method can be implemented, and the same technology can be achieved. Effects, in order to avoid repeated descriptions, are not repeated here.
  • the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof.
  • the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
  • Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • a storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.
  • embodiments of the present invention may be provided as a method, a system, or a computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product implemented on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.
  • Embodiments of the present invention are described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

Abstract

Provided are a neural network computation graph processing method, a processing apparatus, a processing device and a readable storage medium. The processing method comprises: determining, according to attribute information of each operator in a neural network computation graph, a processing unit type corresponding to each operator; dividing the neural network computation graph into one or more computation sub-graphs according to the processing unit type corresponding to each operator and a dependency relationship between a plurality of operators, wherein each computation sub-graph corresponds to a processing unit; and processing the computation sub-graphs by means of processing units corresponding to the computation sub-graphs.

Description

神经网络计算图的处理方法、装置及处理设备Method, device and processing device for processing neural network computing graph 技术领域technical field
本发明涉及网络通信技术领域,具体涉及一种神经网络计算图的处理方法、一种处理装置、一种处理设备及一种可读存储介质。The present invention relates to the technical field of network communication, in particular to a method for processing a neural network computation graph, a processing device, a processing device and a readable storage medium.
背景技术Background technique
神经网络计算图由多种算子组成,算子可以看作是运算函数,比如卷积、池化(pooling)等。The neural network computation graph consists of a variety of operators, which can be regarded as operational functions, such as convolution and pooling.
中央处理器(Central Processing Unit,CPU)虽然可以运行所有的算子,但是卷积、全连接(1024×4096)等计算量大,用CPU运行会非常慢、效率低。这时可以选择用专用的加速处理器(Accelerated Processing Unit,APU)运行。但APU往往支持的指令有限,有的特殊算子不能用APU运行。Although the Central Processing Unit (CPU) can run all the operators, the convolution and full connections (1024×4096) require a large amount of computation, so running on the CPU will be very slow and inefficient. At this time, you can choose to run with a dedicated accelerated processor (Accelerated Processing Unit, APU). However, the APU often supports limited instructions, and some special operators cannot be run by the APU.
因此,如何分配用于处理神经网络的算子的处理器是目前亟待解决的技术问题。Therefore, how to allocate the processors for processing the operators of the neural network is a technical problem that needs to be solved urgently.
发明内容SUMMARY OF THE INVENTION
本发明提供一种神经网络计算图的处理方法、一种处理装置、一种处理设备以及一种可读存储介质。The present invention provides a method for processing a neural network computation graph, a processing device, a processing device and a readable storage medium.
第一方面,提供一种神经网络计算图的处理方法,该处理方法包括:根据神经网络计算图中各算子的属性信息,确定各个所述算子对应的处理器类型;根据各所述算子对应的处理器类型和多个所述算子之间的依赖关系,将所述神经网络计算图划分为一个或多个计算子图,每个所述计算子图中的所有算子对应一种处理器;通过所述计算子图对应的处理器对所述计算子图进行处理。A first aspect provides a method for processing a neural network computation graph, the processing method comprising: determining a processor type corresponding to each operator according to attribute information of each operator in the neural network computation graph; The dependency relationship between the processor type corresponding to the sub-graph and a plurality of the operators, the neural network calculation graph is divided into one or more calculation sub-graphs, and all operators in each of the calculation sub-graphs correspond to a A processor; the calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.
第二方面,提供一种处理装置,该处理装置包括:确定模块,用于根据神经网络计算图中各算子的属性信息,确定各个所述算子对应的处理器类型;划分模块,用于根据各所述算子对应的处理器类型和 多个所述算子之间的依赖关系,将所述神经网络计算图划分为一个或多个计算子图,每个所述计算子图对应一种处理器;处理模块,用于通过所述计算子图对应的处理器对所述计算子图进行处理。In a second aspect, a processing device is provided, the processing device comprising: a determination module for determining a processor type corresponding to each operator according to attribute information of each operator in a neural network calculation graph; a division module for The neural network computation graph is divided into one or more computation subgraphs according to the processor type corresponding to each operator and the dependencies among the plurality of operators, each of the computation subgraphs corresponds to a A processor; a processing module, configured to process the calculation sub-graph through the processor corresponding to the calculation sub-graph.
第三方面,提供一种处理设备,该处理设备包括:处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序,所述程序被所述处理器执行时实现如第一方面所述的神经网络计算图的处理方法的步骤。In a third aspect, a processing device is provided, the processing device comprising: a processor, a memory, and a program stored on the memory and executable on the processor, the program when executed by the processor implements the following The steps of the method for processing a neural network computation graph described in the first aspect.
第四方面,提供一种可读存储介质,所述可读存储介质上存储有程序,所述程序被处理器执行时实现如第一方面所述的神经网络计算图的处理方法的步骤。In a fourth aspect, a readable storage medium is provided, where a program is stored on the readable storage medium, and when the program is executed by a processor, the steps of the method for processing a neural network computation graph according to the first aspect are implemented.
在本公开提供的神经网络计算图的处理方法、处理装置、处理设备及可读存储介质的技术方案中,根据算子的属性信息确定算子对应的处理器类型,并根据算子对应的处理器类型和多个算子之间的依赖关系,将神经网络计算图划分为一个或多个计算子图,每个计算子图对应一种处理器,利用计算子图对应的处理器对该计算子图进行处理,从而实现了对计算资源的有效利用,提高了神经网络计算图的处理速度,同时可以有效解决由于特定类型的处理器对于某些算子无法处理或者处理效率低导致神经网络计算图的处理速度较慢的问题。In the technical solution of the neural network computation graph processing method, processing device, processing device and readable storage medium provided by the present disclosure, the processor type corresponding to the operator is determined according to the attribute information of the operator, and the processing corresponding to the operator is determined according to the attribute information of the operator. The dependency relationship between the operator type and multiple operators, the neural network calculation graph is divided into one or more calculation subgraphs, each calculation subgraph corresponds to a kind of processor, and the processor corresponding to the calculation subgraph is used for the calculation. The sub-graph is processed, so as to realize the effective use of computing resources, improve the processing speed of the neural network computing graph, and effectively solve the problem of neural network computing due to the inability of certain types of processors to process certain operators or the low processing efficiency. The problem of slow processing of graphs.
附图说明Description of drawings
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be considered limiting of the invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:
图1为本发明实施例提供的一种神经网络计算图的处理方法的流程图。FIG. 1 is a flowchart of a method for processing a neural network computation graph according to an embodiment of the present invention.
图2为一种将神经网络计算图划分为一个或多个计算子图的具体实现方式的流程图。FIG. 2 is a flowchart of a specific implementation manner of dividing a neural network computation graph into one or more computation subgraphs.
图3为一种神经网络计算图的示意图。FIG. 3 is a schematic diagram of a neural network computation graph.
图4为一种通过计算子图对应的处理器对计算子图进行处理的具 体实现方式的流程图。Fig. 4 is a flow chart of a specific implementation manner of processing a calculation subgraph by a processor corresponding to the calculation subgraph.
图5为本发明实施例提供的一种处理装置的组成框图。FIG. 5 is a block diagram of a processing apparatus provided by an embodiment of the present invention.
图6为本发明实施例提供的一种处理设备的组成框图。FIG. 6 is a block diagram of the composition of a processing device according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
本发明的说明书和权利要求书中的术语“包括”以及它的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。此外,说明书以及权利要求中使用“和/或”表示所连接对象的至少其中之一,例如A和/或B,表示包含单独A,单独B,以及A和B都存在三种情况。The term "comprising" and any variations thereof in the description and claims of the present invention are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to the explicit Those steps or units are explicitly listed, but may include other steps or units not expressly listed or inherent to the process, method, product or apparatus. In addition, the use of "and/or" in the description and the claims indicates at least one of the connected objects, such as A and/or B, indicating that there are three cases including A alone, B alone, and both A and B.
在本发明实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本发明实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present invention, words such as "exemplary" or "for example" are used to mean serving as an example, illustration or illustration. Any embodiments or designs described as "exemplary" or "such as" in the embodiments of the present invention should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.
图1为本发明实施例提供的一种神经网络计算图的处理方法的流程图。FIG. 1 is a flowchart of a method for processing a neural network computation graph according to an embodiment of the present invention.
参见图1,本发明实施例提供一种神经网络计算图的处理方法,具体步骤包括:步骤101、步骤102、步骤103和步骤104。Referring to FIG. 1 , an embodiment of the present invention provides a method for processing a neural network computation graph. The specific steps include: step 101 , step 102 , step 103 and step 104 .
步骤101:获取神经网络计算图(Computational Graph)中所有算子的属性信息。Step 101: Obtain attribute information of all operators in the neural network computational graph (Computational Graph).
可以理解的是,神经网络可以是由算子(Operator,OP)组成的一个计算网络,神经网络计算图可以由多个算子构成,算子可以看作 是运算函数,比如卷积、池化(Pooling)、全连接等。可选地,神经网络是树形结构的神经网络模型,可以理解的是,本发明实施例对神经网络模型的具体实现形式不做限定。It can be understood that a neural network can be a computing network composed of operators (Operator, OP). The neural network computing graph can be composed of multiple operators, and operators can be regarded as operation functions, such as convolution and pooling. (Pooling), full connection, etc. Optionally, the neural network is a tree-structured neural network model. It can be understood that the embodiment of the present invention does not limit the specific implementation form of the neural network model.
在本发明实施例中,每个算子具有不同的运算过程,还具有一种或多种属性,算子的属性信息可以包括以下一项或多项组合:计算量、算子函数和算子参数。In this embodiment of the present invention, each operator has a different operation process and also has one or more attributes, and the attribute information of the operator may include a combination of one or more of the following: calculation amount, operator function, and operator parameter.
步骤102:根据神经网络计算图中各算子的属性信息,确定各个算子对应的处理器类型。Step 102: Determine the processor type corresponding to each operator according to the attribute information of each operator in the neural network calculation graph.
在本发明实施例中,处理器(也可以称为处理单元)的种类(类型)可以包括中央处理器(Central Processing Unit,CPU)、加速处理器(Accelerated Processing Unit,APU)、图形处理器(Graphics Processing Unit,GPU)、张量处理器(Tensor Processing Unit,TPU)等,本发明实施例的处理器的种类并不限于此,还可以包括其他种类的处理器。In this embodiment of the present invention, the type (type) of the processor (also referred to as a processing unit) may include a central processing unit (Central Processing Unit, CPU), an accelerated processor (Accelerated Processing Unit, APU), a graphics processor ( Graphics Processing Unit, GPU), tensor processor (Tensor Processing Unit, TPU), etc. The types of processors in the embodiments of the present invention are not limited to this, and may also include other types of processors.
在一些实施例中,可以预先确定各种类处理器所对应的算子的计算量、算子函数、算子参数,并建立算子的计算量、算子函数、算子参数与处理器类型的对应关系,则在步骤102中,可以从预先确定的对应关系中获取神经网络计算图的各算子所对应的处理器类型,算子所对应的处理器类型是指用于处理该算子的处理器的类型。In some embodiments, the calculation amount, operator function, and operator parameters of operators corresponding to various types of processors may be predetermined, and the calculation amount, operator function, operator parameter and processor type of the operator may be established. corresponding relationship, then in step 102, the processor type corresponding to each operator of the neural network calculation graph can be obtained from the predetermined corresponding relationship, and the processor type corresponding to the operator refers to the type of processor used to process the operator The type of processor.
例如,神经网络计算图包括算子1和算子2,其中算子1的属性信息包括以下一项或多项组合:计算量A1、算子函数A2和算子参数A3,根据上述对应关系,计算量A1、算子函数A2、算子参数A3对应的处理器类型为中央处理器,则算子1对应的处理器类型为中央处理器;算子2的属性信息包括以下一项或多项组合:计算量B1、算子函数B2和算子参数B3,根据上述对应关系,计算量B1、算子函数B2和算子参数B3对应的处理器类型为加速处理器,则算子2对应的处理器类型为加速处理器。For example, the neural network calculation graph includes operator 1 and operator 2, wherein the attribute information of operator 1 includes one or more of the following combinations: calculation amount A1, operator function A2 and operator parameter A3, according to the above corresponding relationship, The processor type corresponding to the calculation amount A1, the operator function A2, and the operator parameter A3 is the central processing unit, then the processor type corresponding to the operator 1 is the central processing unit; the attribute information of the operator 2 includes one or more of the following Combination: calculation amount B1, operator function B2 and operator parameter B3, according to the above correspondence, the processor type corresponding to calculation amount B1, operator function B2 and operator parameter B3 is an accelerated processor, then the corresponding processor type of operator 2 The processor type is an accelerated processor.
步骤103:根据各算子对应的处理器类型和多个算子之间的依赖关系,将神经网络计算图划分为一个或多个计算子图,每个计算子图中的所有算子对应一种处理器。Step 103: Divide the neural network calculation graph into one or more calculation subgraphs according to the processor type corresponding to each operator and the dependencies between multiple operators, and all operators in each calculation subgraph correspond to a kind of processor.
在本发明实施例中,所划分出的每个计算子图可以对应一种处理 器,比如,将神经网络计算图划分为计算子图1、计算子图2,其中,计算子图1中的所有算子对应CPU,即计算子图1中的所有算子可以由CPU处理,计算子图2中的所有算子对应APU,即计算子图2中的所有算子可以由APU处理。In this embodiment of the present invention, each of the divided calculation subgraphs may correspond to a type of processor. For example, the neural network calculation graph is divided into calculation subgraph 1 and calculation subgraph 2, wherein, in the calculation subgraph 1 All operators correspond to CPU, that is, all operators in calculation subgraph 1 can be processed by CPU, and all operators in calculation subgraph 2 correspond to APU, that is, all operators in calculation subgraph 2 can be processed by APU.
可以理解的是,在本发明实施例中对计算图划分的计算子图的数量不做限定。It can be understood that, in this embodiment of the present invention, the number of calculation subgraphs into which the calculation graph is divided is not limited.
步骤104:通过计算子图对应的处理器对计算子图进行处理。Step 104: Process the calculation subgraph by using a processor corresponding to the calculation subgraph.
具体地,通过计算子图对应的处理器对对应的计算子图进行编译,或者通过计算子图对应的处理器对对应的计算子图进行编译和运行。Specifically, the corresponding calculation subgraph is compiled by the processor corresponding to the calculation subgraph, or the corresponding calculation subgraph is compiled and executed by the processor corresponding to the calculation subgraph.
在本发明实施例中,计算子图中的所有算子对应的处理器类型相同,即针对每个计算子图,该计算子图中的所有算子均由对应的处理器类型的处理器进行处理。In this embodiment of the present invention, all operators in the calculation subgraph have the same processor type, that is, for each calculation subgraph, all operators in the calculation subgraph are performed by processors of the corresponding processor type. deal with.
在一些实施例中,每个计算子图中的所有算子对应一种处理器类型的一个处理器,即每个计算子图均由对应的处理器类型的一个处理器进行处理,从而可以有效提升计算图的处理效率。In some embodiments, all operators in each calculation subgraph correspond to one processor of one processor type, that is, each calculation subgraph is processed by one processor of the corresponding processor type, so that it can effectively Improve the processing efficiency of computational graphs.
在一些实施例中,为了节省计算资源,当存在多个计算子图对应的处理器类型相同时,该多个计算子图可以对应一种处理器类型的同一个处理器,即由对应的处理器类型的同一个处理器处理该多个计算子图。例如,计算子图1对应的处理器类型为中央处理器,计算子图2对应的处理器类型为中央处理器,则计算子图1和计算子图2可以由同一个中央处理器进行处理。In some embodiments, in order to save computing resources, when there are multiple calculation subgraphs corresponding to the same processor type, the multiple calculation subgraphs may correspond to the same processor of one processor type, that is, the corresponding processor The same processor of the processor type processes the multiple computation subgraphs. For example, if the processor type corresponding to the calculation subgraph 1 is a central processing unit, and the processor type corresponding to the calculation subgraph 2 is a central processing unit, the calculation subgraph 1 and the calculation subgraph 2 can be processed by the same central processing unit.
在一些实施例中,当存在多个计算子图对应的处理器类型相同时,该多个计算子图可以对应同一处理器类型的不同处理器,即由对应的处理器类型的多个处理器处理该多个计算子图。例如,计算子图1对应的处理器类型为中央处理器,计算子图2对应的处理器类型为中央处理器,则计算子图1可以由同一个中央处理器1进行处理,计算子图2可以由同一个中央处理器2进行处理,中央处理器1和中央处理器2是同一处理器类型下的不同的中央处理器。In some embodiments, when there are multiple calculation subgraphs corresponding to the same processor type, the multiple calculation subgraphs may correspond to different processors of the same processor type, that is, by multiple processors of the corresponding processor type The plurality of computational subgraphs are processed. For example, if the processor type corresponding to the calculation sub-graph 1 is a central processing unit, and the processor type corresponding to the calculation sub-graph 2 is a central processing unit, the calculation sub-graph 1 can be processed by the same central processing unit 1, and the calculation sub-graph 2 It can be processed by the same central processing unit 2, and the central processing unit 1 and the central processing unit 2 are different central processing units under the same processor type.
在本发明实施例中,可以根据算子的属性信息确定算子对应的处理器类型,并根据算子对应的处理器类型和多个算子之间的依赖关系, 将神经网络计算图划分为一个或多个计算子图,每个计算子图对应一种处理器,利用计算子图对应的处理器对该计算子图进行处理,从而实现了对计算资源的有效利用,提高了神经网络计算图的处理速度,同时可以有效解决由于特定类型的处理器对于某些算子无法处理或者处理效率低导致神经网络计算图的处理速度较慢的问题。In this embodiment of the present invention, the processor type corresponding to the operator can be determined according to the attribute information of the operator, and according to the processor type corresponding to the operator and the dependencies between multiple operators, the neural network calculation graph is divided into: One or more calculation subgraphs, each calculation subgraph corresponds to a processor, and the calculation subgraph is processed by the processor corresponding to the calculation subgraph, thereby realizing the effective use of computing resources and improving the neural network calculation. The processing speed of the graph can effectively solve the problem that the processing speed of the neural network calculation graph is slow due to the inability of certain types of processors to process certain operators or the low processing efficiency.
图2为一种将神经网络计算图划分为一个或多个计算子图的具体实现方式的流程图,参见图2,在一些实施例中,步骤103具体步骤包括:步骤201~步骤203。FIG. 2 is a flowchart of a specific implementation manner of dividing a neural network calculation graph into one or more calculation subgraphs. Referring to FIG. 2 , in some embodiments, the specific steps of step 103 include steps 201 to 203 .
步骤201:沿第一方向遍历神经网络计算图的各计算支路,计算支路包括多个节点,每个节点对应一个算子。Step 201 : Traverse each computing branch of the neural network computing graph along the first direction, where the computing branch includes a plurality of nodes, and each node corresponds to an operator.
在一些实施例中,神经网络计算图可以包括多条计算支路,每条计算支路可以包括多个节点,每个节点对应计算图中的一个算子。In some embodiments, the neural network computation graph may include multiple computation branches, each computation branch may include multiple nodes, and each node corresponds to an operator in the computation graph.
步骤202:针对当前计算支路,沿第二方向遍历当前计算支路上的各节点,第一方向和第二方向交叉设置。Step 202: For the current computing branch, traverse each node on the current computing branch along the second direction, and set the first direction and the second direction to intersect.
其中,第二方向是计算图中各算子的处理顺序的反方向,例如计算图中各算子的处理顺序为从上至下的处理顺序,则反方向为从下至上的顺序;相应的,第一方向可以是从左至右的方向,或者从右至左的方向。Among them, the second direction is the reverse direction of the processing order of each operator in the calculation diagram. For example, the processing order of each operator in the calculation diagram is the processing order from top to bottom, then the reverse direction is the order from bottom to top; the corresponding , the first direction may be a left-to-right direction or a right-to-left direction.
步骤203:基于遍历顺序,按照各节点所对应的处理器类型、多个节点之间的依赖关系和预设的划分规则,将神经网络计算图划分为一个或多个计算子图。Step 203 : Based on the traversal order, according to the processor type corresponding to each node, the dependency relationship between multiple nodes, and the preset division rule, divide the neural network computation graph into one or more computation subgraphs.
多个节点(算子)之间的依赖关系可以包括:各个算子在神经网络中的层级关系和/或各个算子的处理顺序。The dependencies between the multiple nodes (operators) may include: the hierarchical relationship of each operator in the neural network and/or the processing order of each operator.
以神经网络为树形神经网络为例,该依赖关系可以包括:父子节点关系、同胞节点关系等。Taking the neural network as a tree neural network as an example, the dependency relationship may include: parent-child node relationship, sibling node relationship, and the like.
可以理解的是,在节点树中,父节点具有子节点。同级的子节点被称为同胞(兄弟或姐妹)节点。顶端节点被称为根(root)节点。除了根节点,其他每个节点都有父节点。Understandably, in a node tree, parent nodes have children. Child nodes of the same level are called sibling (brother or sister) nodes. The top node is called the root node. Except for the root node, every other node has a parent node.
在本发明实施例中,上述节点(算子)的处理顺序是指神经网络计算图中各个算子的编译顺序、或者编译和运行顺序。In the embodiment of the present invention, the processing order of the above-mentioned nodes (operators) refers to the compilation order of each operator in the neural network computation graph, or the compilation and execution order.
在步骤203中,遍历顺序为根据步骤201和步骤202确定的遍历顺序,即沿第一方向遍历计算图的各计算支路的顺序以及沿第二方向遍历当前计算支路的各节点的顺序,基于该遍历顺序,依次遍历各计算支路及各计算支路的各节点,以确定各节点所属的计算子图,从而划分出一个或多个计算子图。In step 203, the traversal order is the traversal order determined according to step 201 and step 202, that is, the order of traversing each computation branch of the computation graph along the first direction and the order of traversing each node of the current computation branch along the second direction, Based on the traversal order, each computing branch and each node of each computing branch are traversed in sequence to determine the computing subgraph to which each node belongs, thereby dividing one or more computing subgraphs.
在一些实施例中,预设的划分规则可以包括:第一划分规则。In some embodiments, the preset division rules may include: a first division rule.
第一划分规则:针对当前计算支路中的当前节点,如果当前节点与上一个节点对应的处理器类型相同,则将当前节点划分至上一个节点所在的计算子图;如果当前节点与上一个节点对应的处理器类型不同,则将当前节点与上一个节点划分至不同的计算子图。The first division rule: for the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located; If the corresponding processor types are different, the current node and the previous node are divided into different calculation subgraphs.
其中,可以理解的是,若当前节点是遍历顺序中的第一个计算支路的第一个节点,在遍历顺序中,当前节点不具有上一个节点,直接将当前节点划分为一个计算子图,并遍历下一个节点。若当前节点不是遍历顺序中的第一个计算支路的第一个节点,则按照遍历顺序,当前节点与当前节点的上一个节点之间的依赖关系是父子节点关系,当前节点是父节点,上一个节点是子节点。Among them, it can be understood that if the current node is the first node of the first calculation branch in the traversal sequence, in the traversal sequence, the current node does not have the previous node, and the current node is directly divided into a calculation subgraph , and traverse the next node. If the current node is not the first node of the first calculation branch in the traversal order, then according to the traversal order, the dependency between the current node and the previous node of the current node is the parent-child relationship, the current node is the parent node, The previous node is a child node.
在本发明实施例中,不同计算子图所包含的节点不同,即不同计算子图之间不存在相同的节点。In the embodiment of the present invention, the nodes included in different calculation subgraphs are different, that is, the same nodes do not exist between different calculation subgraphs.
在一些实施例中,预设的划分规则还可以包括:第二划分规则。In some embodiments, the preset division rule may further include: a second division rule.
第二划分规则:如果当前节点的下一个节点在神经网络计算图中具有多个子节点,且当前节点不是该下一个节点沿第一方向上的最后一个子节点,则不遍历当前节点的下一个节点,而遍历下一个计算支路的各节点;如果当前节点的下一个节点在神经网络计算图中具有多个子节点,且当前节点是该下一个节点沿第一方向上的最后一个子节点,则继续遍历当前计算支路中的该下一个节点。Second division rule: If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then the next node of the current node is not traversed node, and traverse each node of the next calculation branch; if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, Then continue to traverse the next node in the current calculation branch.
其中,可以理解的是,按照遍历顺序,当前节点与当前节点的下一个节点之间的依赖关系是父子节点关系,下一个节点是父节点,当前节点是子节点。It can be understood that, according to the traversal order, the dependency relationship between the current node and the next node of the current node is a parent-child node relationship, the next node is a parent node, and the current node is a child node.
在一些实施例中,预设的划分规则还可以包括:第三划分规则。In some embodiments, the preset division rule may further include: a third division rule.
第三划分规则:如果当前节点所在的计算子图在神经网络计算图 中具有多个父节点,且该多个父节点存在至少一个节点对应的处理器类型与当前节点对应的处理器类型不同,则确定任一父节点与当前节点属于不同的计算子图。The third division rule: if the computing subgraph where the current node is located has multiple parent nodes in the neural network computing graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node, Then it is determined that any parent node and the current node belong to different computing subgraphs.
可以理解的是,在本发明实施例中可以基于上述划分规则,将神经网络计算图划分为一个或多个计算子图。It can be understood that, in this embodiment of the present invention, the neural network computation graph may be divided into one or more computation subgraphs based on the foregoing division rules.
比如,神经网络计算图中包括多个分支,各个分支上的父节点的算子和子节点的算子对应的处理器类型相同,则可以将该神经网络计算图划分为一个计算子图,也就是说,该计算子图就是该神经网络计算图。比如,神经网络计算图中包括多个分支,各个分支上的父节点的算子和子节点的算子对应的处理器类型并不完全相同,则可以基于上述划分规则将该神经网络计算图划分为多个计算子图。For example, if the neural network computation graph includes multiple branches, and the operator of the parent node and the operator of the child node on each branch have the same processor type, the neural network computation graph can be divided into a computation subgraph, that is, Say, the computation subgraph is the neural network computation graph. For example, the neural network computation graph includes multiple branches, and the operator of the parent node and the operator of the child node on each branch have different processor types. Based on the above division rules, the neural network computation graph can be divided into Multiple computational subgraphs.
图3为一种神经网络计算图的示意图,作为示例,神经网络计算图包括节点N1~节点N9,神经网络计算图的各节点的处理顺序为从上至下的顺序,多个节点之间的依赖关系如图3所示,下面以第二方向为从下至上的方向,第一方向为从左至右的方向为例,对本发明实施例中预设的划分规则进行示例性的说明和解释。Fig. 3 is a schematic diagram of a neural network computation graph. As an example, the neural network computation graph includes nodes N1 to N9, and the processing order of each node in the neural network computation graph is from top to bottom. The dependency relationship is shown in FIG. 3 , and the second direction is the direction from bottom to top and the first direction is the direction from left to right as an example to illustrate and explain the preset division rules in the embodiment of the present invention. .
参见图3,作为示例,假设根据步骤101和步骤102,确定节点N7的算子对应的处理器类型为CPU,而其他节点的算子(N1至N6,以及N8和N9)对应的处理器类型为APU,下面基于各算子对应的处理器类型和多个算子之间的依赖关系以及预设的划分规则,对网络中的所有算子进行划分。Referring to FIG. 3, as an example, it is assumed that according to steps 101 and 102, it is determined that the processor type corresponding to the operator of node N7 is CPU, and the processor type corresponding to the operators (N1 to N6, and N8 and N9) of other nodes is determined For APU, all operators in the network are divided below based on the processor type corresponding to each operator, the dependencies between multiple operators, and the preset division rules.
基于遍历顺序,第一个计算支路为最左侧的支路,即由节点N6、N5、N4、N3、N2形成的支路,该支路在遍历顺序中的第一个节点为N6,由于N6是第一个计算支路的第一个节点,故将节点N6划分为一个计算子图,继续遍历下一个节点N5,节点N5是节点N6的父节点,节点N5与节点N6对应的处理器类型相同,基于第一划分规则,将节点N5与节点N6划分为同一个计算子图,并定义为计算子图A。Based on the traversal order, the first calculation branch is the leftmost branch, that is, the branch formed by nodes N6, N5, N4, N3, and N2. The first node of this branch in the traversal order is N6, Since N6 is the first node of the first calculation branch, the node N6 is divided into a calculation subgraph and continues to traverse the next node N5. The node N5 is the parent node of the node N6, and the processing corresponding to the node N5 and the node N6 The types of the devices are the same. Based on the first division rule, the node N5 and the node N6 are divided into the same calculation subgraph, which is defined as the calculation subgraph A.
按照遍历顺序,继续遍历第一个计算支路的节点N5的下一个节点N4,节点N4是N5的父节点,且N4与N5对应的处理器类型相同,按照第一划分规则,将节点N4划分至计算子图A中。According to the traversal order, continue to traverse the next node N4 of the node N5 of the first calculation branch. The node N4 is the parent node of N5, and the processor types corresponding to N4 and N5 are the same. According to the first division rule, the node N4 is divided to the calculation subgraph A.
按照遍历顺序,继续遍历第一个计算支路的节点N4的下一个节点N3,节点N3是节点N4的父节点,且节点N3对应的处理器类型与计算子图A对应的处理器类型相同,按照第一划分规则,将节点N3划分至计算子图A中。According to the traversal order, continue to traverse the next node N3 of the node N4 of the first calculation branch, the node N3 is the parent node of the node N4, and the processor type corresponding to the node N3 is the same as the processor type corresponding to the calculation subgraph A, According to the first division rule, the node N3 is divided into the calculation subgraph A.
按照遍历顺序,第一个计算支路的节点N3的父节点是节点N2,节点N2具有多个子节点(节点N3、N7、N9),且节点N3不是节点N2沿第一方向上的最后一个子节点,节点N2沿第一方向上的最后一个子节点是节点N9,故按照第二划分规则,针对第一个计算支路,不遍历节点N2,而遍历下一个计算支路的各节点,即遍历由节点N6、节点N8、节点N7、节点N2形成的支路。According to the traversal order, the parent node of the node N3 of the first calculation branch is the node N2, the node N2 has multiple child nodes (nodes N3, N7, N9), and the node N3 is not the last child of the node N2 in the first direction node, the last child node of node N2 along the first direction is node N9, so according to the second division rule, for the first calculation branch, node N2 is not traversed, but each node of the next calculation branch is traversed, namely Traverse the branch formed by node N6, node N8, node N7, and node N2.
在节点N6、节点N8、节点N7、节点N2形成的支路中,由于节点N6此前已遍历,故无需重复遍历,按照遍历顺序,遍历节点N8,节点N8对应的处理器类型与节点N6对应的处理器类型相同,即节点N8与计算子图A对应的处理器类型相同,故按照第一划分规则,将节点N8与划分至计算子图A。In the branch formed by node N6, node N8, node N7, and node N2, since node N6 has been traversed before, there is no need to repeat the traversal. According to the traversal order, node N8 is traversed. The processor type corresponding to node N8 is the same as that of node N6. The processor types are the same, that is, the processor types corresponding to the node N8 and the calculation subgraph A are the same, so the node N8 and the calculation subgraph A are divided according to the first division rule.
按照遍历顺序,继续遍历节点N8的下一个节点N7,节点N7对应的处理器类型与节点N8对应的处理器类型不同,按照第一划分规则,将节点N7单独划分为一个计算子图,并定义为计算子图B。According to the traversal order, continue to traverse the next node N7 of the node N8. The processor type corresponding to the node N7 is different from the processor type corresponding to the node N8. According to the first division rule, the node N7 is separately divided into a calculation subgraph, and define for the calculation of subgraph B.
按照遍历顺序,节点N7的父节点即节点N2具有多个子节点(节点N3、N7、N9),且节点N7不是节点N2沿第一方向上的最后一个子节点,按照第二划分规则,不遍历节点N2,而继续遍历下一个计算支路的各节点,即由节点N6、N8、N9、N2、N1形成的支路。According to the traversal order, the parent node of node N7, namely node N2, has multiple child nodes (nodes N3, N7, N9), and node N7 is not the last child node of node N2 along the first direction. According to the second division rule, no traversal is performed. node N2, and continue to traverse each node of the next computing branch, that is, the branch formed by nodes N6, N8, N9, N2, and N1.
在节点N6、节点N8、节点N9、节点N2、节点N1形成的支路中,由于节点N6和节点N8此前已遍历,故无需重复遍历,按照遍历顺序,遍历节点N9,由于计算子图A具有三个父节点,分别是节点N2、节点N7和节点N9,存在节点N7与计算子图A对应的处理器类型不同,节点N7对应的处理器类型为CPU,而计算子图A对应的处理器类型为APU,按照第三划分规则,节点N2、节点N7、节点N9中任一个节点与计算子图A应属于不同的计算子图,因此,虽然节点N9对应的处理器类型与计算子图A对应的处理器类型相同,但不将N9划分 入计算子图A,将节点N9单独划分为一个计算子图,并定义为计算子图C。In the branch formed by node N6, node N8, node N9, node N2, and node N1, since node N6 and node N8 have been traversed before, there is no need to repeat the traversal. According to the traversal order, node N9 is traversed. Since the calculation subgraph A has The three parent nodes are node N2, node N7, and node N9. There are different types of processors corresponding to node N7 and calculation subgraph A. The processor type corresponding to node N7 is CPU, while the processor corresponding to calculation subgraph A The type is APU. According to the third division rule, any one of node N2, node N7, and node N9 should belong to different calculation subgraphs from calculation subgraph A. Therefore, although the processor type corresponding to node N9 is different from calculation subgraph A The corresponding processor types are the same, but N9 is not divided into the calculation subgraph A, and the node N9 is separately divided into a calculation subgraph, which is defined as the calculation subgraph C.
按照遍历顺序,节点N9的父节点即节点N2具有多个子节点(节点N3、N7、N9),且节点N9是节点N2沿第一方向的最后一个子节点,因此,按照第二划分规则,遍历节点N2,由于节点N2和节点N9对应的处理器类型相同,因此,按照第一划分规则,将节点N2划分至计算子图C。According to the traversal order, the parent node of node N9, namely node N2, has multiple child nodes (nodes N3, N7, N9), and node N9 is the last child node of node N2 along the first direction. Therefore, according to the second division rule, traverse For the node N2, since the processor types corresponding to the node N2 and the node N9 are the same, the node N2 is divided into the calculation subgraph C according to the first division rule.
按照遍历顺序,根节点N1与节点N2对应的处理器类型相同,按照第一划分规则,将节点N1划分至计算子图C。According to the traversal order, the root node N1 and the node N2 have the same processor type, and the node N1 is divided into the calculation subgraph C according to the first division rule.
至此,划分出计算子图A、计算子图B和计算子图C,计算子图A和计算子图C对应的处理器类型相同,均为APU,而计算子图B对应的处理器类型为CPU。So far, the calculation subgraph A, the calculation subgraph B, and the calculation subgraph C are divided. The calculation subgraph A and the calculation subgraph C have the same processor type, both of which are APUs, and the calculation subgraph B corresponds to the processor type. CPU.
在一些实施例中,计算子图A和C可以由同一APU进行处理,而计算子图B可以由CPU进行处理。在一些实施例中,计算子图A可以由一个APU进行处理,计算子图C可以由另一个APU进行处理,而计算子图B可以由CPU进行处理。In some embodiments, computational subgraphs A and C may be processed by the same APU, while computational subgraph B may be processed by the CPU. In some embodiments, computational subgraph A may be processed by one APU, computational subgraph C may be processed by another APU, and computational subgraph B may be processed by the CPU.
按照上述划分规则进行计算子图的划分,可以便于按照处理顺序对计算图进行编译和运算,便于配置处理器之间的输入和输出关系,同时每个计算子图对应一种处理器,且每个计算子图对应的处理器类型为根据计算子图中的算子对应的属性信息确定的处理器类型,实现了对计算资源的有效利用,提高了神经网络计算图的处理速度,同时可以有效解决由于特定类型的处理器对于某些算子无法处理或者处理效率低导致神经网络计算图的处理速度较慢的问题。The division of calculation subgraphs according to the above division rules can facilitate the compilation and operation of calculation graphs according to the processing order, and facilitate the configuration of input and output relationships between processors. At the same time, each calculation subgraph corresponds to a processor, and each The processor type corresponding to each calculation subgraph is the processor type determined according to the attribute information corresponding to the operators in the calculation subgraph, which realizes the effective utilization of computing resources, improves the processing speed of the neural network calculation graph, and can effectively Solve the problem that the processing speed of the neural network calculation graph is slow due to the inability of certain types of processors to process certain operators or the low processing efficiency.
图4为一种通过计算子图对应的处理器对计算子图进行处理的具体实现方式的流程图,在一些实施例中,如图4所示,通过计算子图对应的处理器对计算子图进行处理的步骤,即步骤104可以进一步包括步骤401~步骤403。FIG. 4 is a flowchart of a specific implementation manner of processing a calculation subgraph by a processor corresponding to the calculation subgraph. In some embodiments, as shown in FIG. 4 , the calculation subgraph is processed by a processor corresponding to the calculation subgraph. The step of processing the graph, that is, step 104 may further include steps 401 to 403 .
步骤401:根据神经网络计算图中多个算子之间的依赖关系,确定各个计算子图之间的依赖关系。Step 401 : According to the dependencies among the multiple operators in the neural network computation graph, determine the dependencies between the respective computation subgraphs.
步骤402:根据各个计算子图之间的依赖关系,得到各个计算子 图的处理顺序。Step 402: Obtain the processing sequence of each calculation subgraph according to the dependency relationship between each calculation subgraph.
上述计算子图之间的依赖关系用于表示计算子图在计算图中的层级关系或处理顺序,基于计算子图之间的依赖关系可以确定各个计算子图的编译顺序或者编译和运行顺序。The above-mentioned dependencies between the calculation subgraphs are used to represent the hierarchical relationship or processing order of the calculation subgraphs in the calculation graph. Based on the dependencies between the calculation subgraphs, the compilation sequence or the compilation and execution sequence of each calculation subgraph can be determined.
例如,如图3所示,根据上述划分方式划分出计算子图A、计算子图B和计算子图C,其中根据计算子图C、计算子图B、计算子图A的算子之间的依赖关系可知,计算子图C的输出连接计算子图B的节点N7、以及计算子图A的节点N3和N8,计算子图B的输出连接计算子图A的节点N8。For example, as shown in FIG. 3 , the calculation sub-graph A, the calculation sub-graph B and the calculation sub-graph C are divided according to the above-mentioned division method, wherein according to the calculation sub-graph C, the calculation sub-graph B, and the calculation sub-graph A between the operators It can be seen that the output of the calculation subgraph C is connected to the node N7 of the calculation subgraph B, and the nodes N3 and N8 of the calculation subgraph A, and the output of the calculation subgraph B is connected to the node N8 of the calculation subgraph A.
步骤403:根据各计算子图的处理顺序,通过计算子图对应的处理器对计算子图进行处理。Step 403 : According to the processing order of each calculation sub-graph, the calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.
在本发明实施例中,按照算子对应的处理器类型和算子之间的依赖关系将多个算子划分为一个或多个计算子图,并且按照各个计算子图的依赖关系对各计算子图进行处理,这样可以便于配置处理器之间的输入和输出关系,确保神经网络中算子运算的准确性,以及提高运算处理的效率。In this embodiment of the present invention, multiple operators are divided into one or more calculation subgraphs according to their corresponding processor types and dependencies between operators, and each calculation The sub-graph is processed, which can facilitate the configuration of the input and output relationship between the processors, ensure the accuracy of the operator operation in the neural network, and improve the efficiency of the operation processing.
图5为本发明实施例提供的一种处理装置的组成框图。FIG. 5 is a block diagram of a processing apparatus provided by an embodiment of the present invention.
参见图5,本发明实施例提供一种神经网络计算图的处理装置500,该处理装置500用于对神经网络计算图进行处理,该装置500包括:获取模块501、确定模块502、划分模块503和处理模块504。Referring to FIG. 5 , an embodiment of the present invention provides an apparatus 500 for processing a neural network computation graph. The processing apparatus 500 is used to process a neural network computation graph. The apparatus 500 includes: an acquisition module 501 , a determination module 502 , and a division module 503 and processing module 504 .
其中,获取模块501用于获取神经网络计算图中所有算子的属性信息。Wherein, the obtaining module 501 is used for obtaining attribute information of all operators in the neural network calculation graph.
在一些实施例中,算子的属性信息包括以下一项或多项组合:计算量、算子函数和算子参数。In some embodiments, the attribute information of the operator includes a combination of one or more of the following: calculation amount, operator function and operator parameter.
确定模块502用于根据神经网络计算图中各算子的属性信息,确定各个算子对应的处理器类型。划分模块503用于根据各算子对应的处理器类型和多个算子之间的依赖关系,将神经网络计算图划分为一个或多个计算子图,每个计算子图对应一种处理器。处理模块504用于通过计算子图对应的处理器对对应的计算子图进行处理。The determining module 502 is configured to determine the processor type corresponding to each operator according to the attribute information of each operator in the neural network calculation graph. The division module 503 is used to divide the neural network calculation graph into one or more calculation subgraphs according to the processor type corresponding to each operator and the dependencies between multiple operators, and each calculation subgraph corresponds to a processor . The processing module 504 is configured to process the corresponding calculation subgraph through the processor corresponding to the calculation subgraph.
在一些实施例中,计算子图中的所有算子对应的处理器类型相同。 在一些实施例中,对应相同处理器类型的多个所述计算子图,对应同一处理器。In some embodiments, all operators in the computation subgraph correspond to the same processor type. In some embodiments, a plurality of the computing subgraphs corresponding to the same processor type correspond to the same processor.
在一些实施例中,划分模块503进一步用于:沿第一方向遍历神经网络计算图的各计算支路,计算支路包括多个节点,每个节点对应一个算子;针对当前计算支路,沿第二方向遍历该当前计算支路上的各节点,第一方向和第二方向交叉设置;基于遍历顺序,按照各节点所对应的处理器类型、多个节点之间的依赖关系和预设的划分规则,将神经网络计算图划分为一个或多个计算子图。In some embodiments, the dividing module 503 is further configured to: traverse each computation branch of the neural network computation graph along the first direction, where the computation branch includes a plurality of nodes, and each node corresponds to an operator; for the current computation branch, Traverse each node on the current computing branch along the second direction, and set the first direction and the second direction to intersect; The partitioning rule divides the neural network computation graph into one or more computation subgraphs.
在一些实施例中,所述预设的划分规则包括:预设的划分规则包括:第一划分规则。In some embodiments, the preset division rule includes: the preset division rule includes: a first division rule.
第一划分规则包括:针对当前计算支路中的当前节点,如果当前节点与上一个节点对应的处理器类型相同,则将当前节点划分至上一个节点所在的计算子图;如果当前节点与上一个节点对应的处理器类型不同,则将当前节点与上一个节点划分至不同的计算子图;其中,不同计算子图之间不存在相同的节点。The first division rule includes: for the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located; If the processor types corresponding to the nodes are different, the current node and the previous node are divided into different computing subgraphs; wherein, the same nodes do not exist between different computing subgraphs.
在一些实施例中,预设的划分规则还包括:第二划分规则。In some embodiments, the preset division rule further includes: a second division rule.
第二划分规则包括:如果当前节点的下一个节点在神经网络计算图中具有多个子节点,且当前节点不是该下一个节点沿第一方向上的最后一个子节点,则遍历下一个计算支路的各节点;如果当前节点的下一个节点在神经网络计算图中具有多个子节点,且当前节点是该下一个节点沿第一方向上的最后一个子节点,则继续遍历当前计算支路中的该下一节点。The second division rule includes: if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then traverse the next calculation branch Each node of the current node; if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, continue to traverse the nodes in the current calculation branch the next node.
在一些实施例中,预设的划分规则还包括:第三划分规则。In some embodiments, the preset division rule further includes: a third division rule.
第三划分规则包括:如果当前节点所在的计算子图在神经网络计算图中具有多个父节点,且该多个父节点存在至少一个节点对应的处理器类型与当前节点对应的处理器类型不同,则确定任一父节点与当前节点属于不同的计算子图。The third division rule includes: if the calculation subgraph where the current node is located has multiple parent nodes in the neural network calculation graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node , then it is determined that any parent node and the current node belong to different computational subgraphs.
在一些实施方式中,所述处理模块504进一步用于:根据神经网络计算图中多个算子之间的依赖关系,确定各个计算子图之间的依赖关系;根据各个计算子图之间的依赖关系,得到各个计算子图的处理 顺序;根据处理顺序,通过计算子图对应的处理器对对应的计算子图进行处理。In some implementations, the processing module 504 is further configured to: determine the dependency between the respective calculation subgraphs according to the dependency between the multiple operators in the neural network calculation graph; According to the dependency relationship, the processing order of each calculation subgraph is obtained; according to the processing sequence, the corresponding calculation subgraph is processed by the processor corresponding to the calculation subgraph.
本发明实施例提供的用于处理神经网络计算图的处理装置500能够实现上述的处理方法的各个步骤和过程,并能够达到相同的技术效果,为避免重复描述,这里不再赘述。The processing apparatus 500 for processing a neural network computation graph provided by the embodiment of the present invention can implement the various steps and processes of the above-mentioned processing method, and can achieve the same technical effect.
图6为本发明实施例提供的一种处理设备的组成框图。FIG. 6 is a block diagram of the composition of a processing device according to an embodiment of the present invention.
如图6所示,本发明实施例还提供一种处理设备600,该处理设备600包括处理器601,存储器602,以及存储在存储器602上并可在处理器601上运行的程序或指令,该程序或指令被处理器601执行时实现上述处理方法的各个步骤和过程,且能达到相同的技术效果,为避免重复描述,这里不再赘述。As shown in FIG. 6, an embodiment of the present invention further provides a processing device 600, the processing device 600 includes a processor 601, a memory 602, and a program or instruction stored in the memory 602 and executable on the processor 601, the When the program or instruction is executed by the processor 601, each step and process of the above-mentioned processing method is implemented, and the same technical effect can be achieved.
本发明实施例还提供一种可读存储介质,该可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述处理方法的各个步骤和过程,且能达到相同的技术效果,为避免重复描述,这里不再赘述。An embodiment of the present invention further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, each step and process of the above processing method can be implemented, and the same technology can be achieved. Effects, in order to avoid repeated descriptions, are not repeated here.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能存取的任何可用介质。Those skilled in the art should appreciate that, in one or more of the above examples, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。The specific embodiments described above further describe the objectives, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included within the protection scope of the present invention.
本领域内的技术人员应明白,本发明实施例可提供为方法、系统、或计算机程序产品。因此,本发明实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发 明实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It should be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, a system, or a computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product implemented on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.
本发明实施例是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。Embodiments of the present invention are described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
显然,本领域的技术人员可以对本发明实施例进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明实施例的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present invention without departing from the spirit and scope of the present invention. Thus, provided that these modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (20)

  1. 一种神经网络计算图的处理方法,其特征在于,包括:A method for processing a neural network computation graph, comprising:
    根据神经网络计算图中各算子的属性信息,确定各个所述算子对应的处理器类型;Determine the processor type corresponding to each operator according to the attribute information of each operator in the neural network calculation graph;
    根据各所述算子对应的处理器类型和多个所述算子之间的依赖关系,将所述神经网络计算图划分为一个或多个计算子图,每个所述计算子图中的所有算子对应一种处理器;The neural network computation graph is divided into one or more computation subgraphs according to the processor type corresponding to each of the operators and the dependencies among the plurality of the operators. All operators correspond to one processor;
    通过所述计算子图对应的处理器对所述计算子图进行处理。The calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.
  2. 根据权利要求1所述的方法,其特征在于,每个所述计算子图中的所有算子对应一种处理器类型的一个处理器。The method according to claim 1, wherein all operators in each of the calculation subgraphs correspond to one processor of one processor type.
  3. 根据权利要求1所述的方法,其特征在于,多个所述计算子图对应一种处理器类型的同一个处理器。The method according to claim 1, wherein a plurality of the calculation subgraphs correspond to the same processor of one processor type.
  4. 根据权利要求1所述的方法,其特征在于,所述根据各所述算子对应的处理器类型和多个所述算子之间的依赖关系,将所述神经网络计算图划分为一个或多个计算子图的步骤,包括:The method according to claim 1, characterized in that, according to the processor type corresponding to each operator and the dependency relationship between a plurality of the operators, the neural network calculation graph is divided into one or more Multiple steps to compute subgraphs, including:
    沿第一方向遍历所述神经网络计算图的各计算支路,所述计算支路包括多个节点,每个节点对应一个所述算子;Traverse each computing branch of the neural network computing graph along the first direction, where the computing branch includes a plurality of nodes, and each node corresponds to one of the operators;
    针对当前计算支路,沿第二方向遍历当前计算支路上的各所述节点,所述第一方向和所述第二方向交叉设置;For the current computing branch, traverse each of the nodes on the current computing branch along a second direction, and the first direction and the second direction are set intersecting;
    基于遍历顺序,按照各所述节点所对应的处理器类型、多个所述节点之间的依赖关系和预设的划分规则,将所述神经网络计算图划分为一个或多个计算子图。Based on the traversal order, the neural network computation graph is divided into one or more computation subgraphs according to the processor type corresponding to each of the nodes, the dependencies among a plurality of the nodes, and a preset division rule.
  5. 根据权利要求4所述的方法,其特征在于,所述预设的划分规则包括:The method according to claim 4, wherein the preset division rule comprises:
    针对当前计算支路中的当前节点,如果当前节点与上一个节点对应的处理器类型相同,则将当前节点划分至上一个节点所在的计算子图;For the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located;
    如果当前节点与上一个节点对应的处理器类型不同,则将当前节点与上一个节点划分至不同的计算子图;If the processor types corresponding to the current node and the previous node are different, divide the current node and the previous node into different computing subgraphs;
    其中,不同所述计算子图之间不存在相同的节点。Wherein, the same node does not exist between different calculation subgraphs.
  6. 根据权利要求5所述的方法,其特征在于,所述预设的划分规则还包括:The method according to claim 5, wherein the preset division rule further comprises:
    如果当前节点的下一个节点在所述神经网络计算图中具有多个子节点,且当前节点不是该下一个节点沿第一方向上的最后一个子节点,则遍历下一个计算支路的各节点;If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then traverse each node of the next calculation branch;
    如果当前节点的下一个节点在所述神经网络计算图中具有多个子节点,且当前节点是该下一个节点沿第一方向上的最后一个子节点,则继续遍历当前计算支路中的该下一个节点。If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, continue to traverse the next node in the current calculation branch a node.
  7. 根据权利要求5或6所述的方法,其特征在于,所述预设的划分规则还包括:The method according to claim 5 or 6, wherein the preset division rule further comprises:
    如果当前节点所在的计算子图在所述神经网络计算图中具有多个父节点,且该多个父节点存在至少一个节点对应的处理器类型与当前节点对应的处理器类型不同,则确定任一所述父节点与当前节点属于不同的计算子图。If the calculation subgraph where the current node is located has multiple parent nodes in the neural network calculation graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node, it is determined that any - the parent node and the current node belong to different computing subgraphs.
  8. 根据权利要求1所述的方法,其特征在于,所述通过所述计算子图对应的处理器对所述计算子图进行处理的步骤,包括:The method according to claim 1, wherein the step of processing the calculation subgraph by a processor corresponding to the calculation subgraph comprises:
    根据所述神经网络计算图中多个所述算子之间的依赖关系,确定各个所述计算子图之间的依赖关系;According to the dependencies between a plurality of the operators in the neural network computation graph, determine the dependencies between each of the computation subgraphs;
    根据各个所述计算子图之间的依赖关系,得到各个所述计算子图的处理顺序;Obtain the processing order of each of the calculation subgraphs according to the dependencies between the calculation subgraphs;
    根据所述处理顺序,通过所述计算子图对应的处理器对所述计算子图进行处理。According to the processing sequence, the calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.
  9. 根据权利要求1-5中任一所述的方法,其特征在于,所述算子的属性信息包括以下一项或多项组合:计算量、算子函数和算子参数。The method according to any one of claims 1-5, wherein the attribute information of the operator comprises a combination of one or more of the following: calculation amount, operator function and operator parameter.
  10. 一种神经网络计算图的处理装置,其特征在于,包括:A processing device for a neural network computation graph, comprising:
    确定模块,用于根据神经网络计算图中各算子的属性信息,确定各个所述算子对应的处理器类型;a determination module, configured to determine the processor type corresponding to each of the operators according to the attribute information of each operator in the neural network calculation graph;
    划分模块,用于根据各所述算子对应的处理器类型和多个所述算子之间的依赖关系,将所述神经网络计算图划分为一个或多个计算子 图,每个所述计算子图对应一种处理器;A division module, configured to divide the neural network computation graph into one or more computation subgraphs according to the processor type corresponding to each of the operators and the dependencies between a plurality of the operators, each of the The calculation subgraph corresponds to a processor;
    处理模块,用于通过所述计算子图对应的处理器对所述计算子图进行处理。The processing module is configured to process the calculation sub-graph through the processor corresponding to the calculation sub-graph.
  11. 根据权利要求10所述的装置,其特征在于,每个所述计算子图中的所有算子对应一种处理器类型的一个处理器。The apparatus according to claim 10, wherein all operators in each of the calculation subgraphs correspond to one processor of one processor type.
  12. 根据权利要求10所述的装置,其特征在于,多个所述计算子图对应一种处理器类型的同一个处理器。The apparatus according to claim 10, wherein a plurality of the calculation subgraphs correspond to the same processor of one processor type.
  13. 根据权利要求10所述的装置,其特征在于,所述划分模块用于:沿第一方向遍历所述神经网络计算图的各计算支路,所述计算支路包括多个节点,每个节点对应一个所述算子;针对当前计算支路,沿第二方向遍历该当前计算支路上的各所述节点,所述第一方向和所述第二方向交叉设置;基于遍历顺序,按照各所述节点所对应的处理器类型、多个所述节点之间的依赖关系和预设的划分规则,将所述神经网络计算图划分为一个或多个计算子图。The device according to claim 10, wherein the dividing module is configured to: traverse each computation branch of the neural network computation graph along a first direction, the computation branch comprising a plurality of nodes, each node Corresponding to one of the operators; for the current calculation branch, traverse each of the nodes on the current calculation branch along the second direction, and the first direction and the second direction are crossed and set; based on the traversal order, according to each node The neural network computation graph is divided into one or more computation subgraphs according to the processor type corresponding to the node, the dependencies among a plurality of the nodes, and a preset division rule.
  14. 根据权利要求13所述的装置,其特征在于,所述预设的划分规则包括:The device according to claim 13, wherein the preset division rule comprises:
    针对当前计算支路中的当前节点,如果当前节点与上一个节点对应的处理器类型相同,则将当前节点划分至上一个节点所在的计算子图;For the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located;
    如果当前节点与上一个节点对应的处理器类型不同,则将当前节点与上一个节点划分至不同的计算子图;If the processor types corresponding to the current node and the previous node are different, divide the current node and the previous node into different computing subgraphs;
    其中,不同所述计算子图之间不存在相同的节点。Wherein, the same node does not exist between different calculation subgraphs.
  15. 根据权利要求14所述的装置,其特征在于,所述预设的划分规则包括:The device according to claim 14, wherein the preset division rule comprises:
    如果当前节点的下一个节点在所述神经网络计算图中具有多个子节点,且当前节点不是该下一个节点沿第一方向上的最后一个子节点,则遍历下一个计算支路的各节点;If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then traverse each node of the next calculation branch;
    如果当前节点的下一个节点在所述神经网络计算图中具有多个子节点,且当前节点是该下一个节点沿第一方向上的最后一个子节点,则继续遍历当前计算支路中的该下一个节点。If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, continue to traverse the next node in the current calculation branch a node.
  16. 根据权利要求14或15所述的装置,其特征在于,所述预设的划分规则还包括:The device according to claim 14 or 15, wherein the preset division rule further comprises:
    如果当前节点所在的计算子图在所述神经网络计算图中具有多个父节点,且该多个父节点存在至少一个节点对应的处理器类型与当前节点对应的处理器类型不同,则确定任一所述父节点与当前节点属于不同的计算子图。If the calculation subgraph where the current node is located has multiple parent nodes in the neural network calculation graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node, it is determined that any - the parent node and the current node belong to different computing subgraphs.
  17. 根据权利要求10所述的装置,其特征在于,所述处理模块用于:根据所述神经网络计算图中多个所述算子之间的依赖关系,确定各个所述计算子图之间的依赖关系;根据各个所述计算子图之间的依赖关系,得到各个所述计算子图的处理顺序;根据所述处理顺序,通过所述计算子图对应的处理器对所述计算子图进行处理。The apparatus according to claim 10, wherein the processing module is configured to: determine the relationship between each of the calculation subgraphs according to the dependencies between the plurality of the operators in the neural network calculation graph Dependency relationship; according to the dependency relationship between each of the calculation subgraphs, the processing sequence of each of the calculation subgraphs is obtained; according to the processing sequence, the calculation subgraph is processed by the processor corresponding to the calculation subgraph deal with.
  18. 根据权利要求10-14中任一项所述的装置,其特征在于,所述算子的属性信息包括以下一项或多项组合:计算量、算子函数和算子参数。The apparatus according to any one of claims 10-14, wherein the attribute information of the operator includes a combination of one or more of the following: calculation amount, operator function, and operator parameter.
  19. 一种处理设备,其特征在于,包括:处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序,所述程序被所述处理器执行时实现如权利要求1至9中任一项所述的神经网络计算图的处理方法的步骤。A processing device, characterized in that it comprises: a processor, a memory, and a program stored on the memory and executable on the processor, and when the program is executed by the processor, the programs according to claims 1 to 1 are implemented. Steps of the method for processing a neural network computation graph according to any one of 9.
  20. 一种可读存储介质,其特征在于,所述可读存储介质上存储有程序,所述程序被处理器执行时实现如权利要求1至9中任一项所述的神经网络计算图的处理方法的步骤。A readable storage medium, characterized in that, a program is stored on the readable storage medium, and when the program is executed by a processor, the processing of the neural network computation graph according to any one of claims 1 to 9 is realized steps of the method.
PCT/CN2021/109499 2020-07-31 2021-07-30 Neural network computation graph processing method and apparatus, and processing device WO2022022670A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010757561.1 2020-07-31
CN202010757561.1A CN111860820A (en) 2020-07-31 2020-07-31 Neural network operator dividing method and device and dividing equipment

Publications (1)

Publication Number Publication Date
WO2022022670A1 true WO2022022670A1 (en) 2022-02-03

Family

ID=72953458

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109499 WO2022022670A1 (en) 2020-07-31 2021-07-30 Neural network computation graph processing method and apparatus, and processing device

Country Status (2)

Country Link
CN (1) CN111860820A (en)
WO (1) WO2022022670A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691330A (en) * 2022-03-28 2022-07-01 北京百度网讯科技有限公司 Data processing method, data processing device, electronic equipment and storage medium
CN115268877A (en) * 2022-09-27 2022-11-01 之江实验室 Intermediate representation method and device for parallel execution of graph computation
CN115796228A (en) * 2022-11-15 2023-03-14 北京百度网讯科技有限公司 Operator fusion method, device, equipment and storage medium
US11782723B1 (en) 2022-09-27 2023-10-10 Zhejiang Lab Intermediate representation method and apparatus for parallel execution of graph computation
WO2023207361A1 (en) * 2022-04-26 2023-11-02 山东云海国创云计算装备产业创新中心有限公司 Memory management method, system, device and computer readable storage medium
CN117576125A (en) * 2024-01-16 2024-02-20 芯瞳半导体技术(山东)有限公司 Neural network calculation graph segmentation method, device, equipment and storage medium

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860820A (en) * 2020-07-31 2020-10-30 北京灵汐科技有限公司 Neural network operator dividing method and device and dividing equipment
CN114626284A (en) * 2020-12-14 2022-06-14 华为技术有限公司 Model processing method and related device
CN112947933A (en) * 2021-02-24 2021-06-11 上海商汤智能科技有限公司 Operator execution method and device, computer equipment and storage medium
CN113051080A (en) * 2021-04-22 2021-06-29 杭州海康威视数字技术股份有限公司 Computation graph execution method and device and heterogeneous platform
CN116362305A (en) * 2021-12-24 2023-06-30 Oppo广东移动通信有限公司 Data processing method, device, computer equipment and storage medium
CN116432736A (en) * 2021-12-31 2023-07-14 华为技术有限公司 Neural network model optimization method and device and computing equipment
CN114819084B (en) * 2022-04-26 2024-03-01 北京百度网讯科技有限公司 Model reasoning method, device, equipment and storage medium
CN117521841A (en) * 2022-07-28 2024-02-06 华为技术有限公司 Deep learning system and method
CN115358379B (en) * 2022-10-20 2023-01-10 腾讯科技(深圳)有限公司 Neural network processing method, neural network processing device, information processing method, information processing device and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108292241A (en) * 2015-10-28 2018-07-17 谷歌有限责任公司 Processing calculates figure
CN109754073A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 Data processing method, device, electronic equipment and readable storage medium storing program for executing
CN110689115A (en) * 2019-09-24 2020-01-14 上海寒武纪信息科技有限公司 Neural network model processing method and device, computer equipment and storage medium
CN111062467A (en) * 2019-12-18 2020-04-24 开放智能机器(上海)有限公司 Automatic neural network subgraph segmentation method applied to AI heterogeneous compiler
CN111860820A (en) * 2020-07-31 2020-10-30 北京灵汐科技有限公司 Neural network operator dividing method and device and dividing equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108292241A (en) * 2015-10-28 2018-07-17 谷歌有限责任公司 Processing calculates figure
CN109754073A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 Data processing method, device, electronic equipment and readable storage medium storing program for executing
CN110689115A (en) * 2019-09-24 2020-01-14 上海寒武纪信息科技有限公司 Neural network model processing method and device, computer equipment and storage medium
CN111062467A (en) * 2019-12-18 2020-04-24 开放智能机器(上海)有限公司 Automatic neural network subgraph segmentation method applied to AI heterogeneous compiler
CN111860820A (en) * 2020-07-31 2020-10-30 北京灵汐科技有限公司 Neural network operator dividing method and device and dividing equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691330A (en) * 2022-03-28 2022-07-01 北京百度网讯科技有限公司 Data processing method, data processing device, electronic equipment and storage medium
WO2023207361A1 (en) * 2022-04-26 2023-11-02 山东云海国创云计算装备产业创新中心有限公司 Memory management method, system, device and computer readable storage medium
CN115268877A (en) * 2022-09-27 2022-11-01 之江实验室 Intermediate representation method and device for parallel execution of graph computation
US11782723B1 (en) 2022-09-27 2023-10-10 Zhejiang Lab Intermediate representation method and apparatus for parallel execution of graph computation
CN115796228A (en) * 2022-11-15 2023-03-14 北京百度网讯科技有限公司 Operator fusion method, device, equipment and storage medium
CN115796228B (en) * 2022-11-15 2024-04-05 北京百度网讯科技有限公司 Operator fusion method, device, equipment and storage medium
CN117576125A (en) * 2024-01-16 2024-02-20 芯瞳半导体技术(山东)有限公司 Neural network calculation graph segmentation method, device, equipment and storage medium
CN117576125B (en) * 2024-01-16 2024-04-16 芯瞳半导体技术(山东)有限公司 Neural network calculation graph segmentation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111860820A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
WO2022022670A1 (en) Neural network computation graph processing method and apparatus, and processing device
JP6908682B2 (en) Execution of program specifications based on graphs
Karloff et al. A model of computation for MapReduce
US20130232476A1 (en) Automatic pipeline parallelization of sequential code
US20160342395A1 (en) Visual program specification and compilation of graph-based computation
US20140165049A1 (en) Compiler-controlled region scheduling for simd execution of threads
JP6763072B2 (en) Compile data processing graph
KR20170042801A (en) Visually specifying subsets of components in graph-based programs through user interactions
JP6778193B2 (en) Controlling data processing tasks
CN114936099B (en) Graph optimization method and device for neural network calculation
US11630986B2 (en) Graph conversion method
CN115033391B (en) Data flow method and device for neural network calculation
CN106293947B (en) GPU-CPU (graphics processing Unit-Central processing Unit) mixed resource allocation system and method in virtualized cloud environment
WO2022087788A1 (en) Neural network compiling optimization method and related apparatus
CN115794393A (en) Method, device, server and storage medium for executing business model
Sahni et al. Heterogeneity-aware elastic scaling of streaming applications on cloud platforms
Khasanov et al. Implicit data-parallelism in Kahn process networks: Bridging the MacQueen Gap
Goens et al. On the representation of mappings to multicores
Kienberger et al. Analysis and validation of AUTOSAR models
Shivashankar et al. Efficient software for programmable visual analysis using Morse-Smale complexes
CN115374914A (en) Distributed training method, parallel deep learning framework and electronic equipment
CN112000478B (en) Method and device for distributing operation resources
CN110969565A (en) Image processing method and device
CN114925591A (en) Automatic parallel strategy searching method based on polyhedron model modeling and related equipment
CN113326137A (en) Deep learning calculation method, device, chip and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21849434

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.05.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21849434

Country of ref document: EP

Kind code of ref document: A1