WO2022022670A1 - Procédé et appareil de traitement de graphe de calcul de réseau de neurones artificiels, et dispositif de traitement - Google Patents

Procédé et appareil de traitement de graphe de calcul de réseau de neurones artificiels, et dispositif de traitement Download PDF

Info

Publication number
WO2022022670A1
WO2022022670A1 PCT/CN2021/109499 CN2021109499W WO2022022670A1 WO 2022022670 A1 WO2022022670 A1 WO 2022022670A1 CN 2021109499 W CN2021109499 W CN 2021109499W WO 2022022670 A1 WO2022022670 A1 WO 2022022670A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
calculation
graph
processor
neural network
Prior art date
Application number
PCT/CN2021/109499
Other languages
English (en)
Chinese (zh)
Inventor
戚海涛
李涵
吴欣洋
丁瑞强
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2022022670A1 publication Critical patent/WO2022022670A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present invention relates to the technical field of network communication, in particular to a method for processing a neural network computation graph, a processing device, a processing device and a readable storage medium.
  • the neural network computation graph consists of a variety of operators, which can be regarded as operational functions, such as convolution and pooling.
  • CPU Central Processing Unit
  • APU Accelerated Processing Unit
  • the present invention provides a method for processing a neural network computation graph, a processing device, a processing device and a readable storage medium.
  • a first aspect provides a method for processing a neural network computation graph, the processing method comprising: determining a processor type corresponding to each operator according to attribute information of each operator in the neural network computation graph; The dependency relationship between the processor type corresponding to the sub-graph and a plurality of the operators, the neural network calculation graph is divided into one or more calculation sub-graphs, and all operators in each of the calculation sub-graphs correspond to a A processor; the calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.
  • a processing device comprising: a determination module for determining a processor type corresponding to each operator according to attribute information of each operator in a neural network calculation graph; a division module for The neural network computation graph is divided into one or more computation subgraphs according to the processor type corresponding to each operator and the dependencies among the plurality of operators, each of the computation subgraphs corresponds to a A processor; a processing module, configured to process the calculation sub-graph through the processor corresponding to the calculation sub-graph.
  • a processing device comprising: a processor, a memory, and a program stored on the memory and executable on the processor, the program when executed by the processor implements the following The steps of the method for processing a neural network computation graph described in the first aspect.
  • a readable storage medium where a program is stored on the readable storage medium, and when the program is executed by a processor, the steps of the method for processing a neural network computation graph according to the first aspect are implemented.
  • the processor type corresponding to the operator is determined according to the attribute information of the operator, and the processing corresponding to the operator is determined according to the attribute information of the operator.
  • the dependency relationship between the operator type and multiple operators the neural network calculation graph is divided into one or more calculation subgraphs, each calculation subgraph corresponds to a kind of processor, and the processor corresponding to the calculation subgraph is used for the calculation.
  • the sub-graph is processed, so as to realize the effective use of computing resources, improve the processing speed of the neural network computing graph, and effectively solve the problem of neural network computing due to the inability of certain types of processors to process certain operators or the low processing efficiency.
  • the problem of slow processing of graphs is referred to the attribute information of the operator.
  • FIG. 1 is a flowchart of a method for processing a neural network computation graph according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a specific implementation manner of dividing a neural network computation graph into one or more computation subgraphs.
  • FIG. 3 is a schematic diagram of a neural network computation graph.
  • Fig. 4 is a flow chart of a specific implementation manner of processing a calculation subgraph by a processor corresponding to the calculation subgraph.
  • FIG. 5 is a block diagram of a processing apparatus provided by an embodiment of the present invention.
  • FIG. 6 is a block diagram of the composition of a processing device according to an embodiment of the present invention.
  • words such as “exemplary” or “for example” are used to mean serving as an example, illustration or illustration. Any embodiments or designs described as “exemplary” or “such as” in the embodiments of the present invention should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present the related concepts in a specific manner.
  • FIG. 1 is a flowchart of a method for processing a neural network computation graph according to an embodiment of the present invention.
  • an embodiment of the present invention provides a method for processing a neural network computation graph.
  • the specific steps include: step 101 , step 102 , step 103 and step 104 .
  • Step 101 Obtain attribute information of all operators in the neural network computational graph (Computational Graph).
  • a neural network can be a computing network composed of operators (Operator, OP).
  • the neural network computing graph can be composed of multiple operators, and operators can be regarded as operation functions, such as convolution and pooling. (Pooling), full connection, etc.
  • the neural network is a tree-structured neural network model. It can be understood that the embodiment of the present invention does not limit the specific implementation form of the neural network model.
  • each operator has a different operation process and also has one or more attributes
  • the attribute information of the operator may include a combination of one or more of the following: calculation amount, operator function, and operator parameter.
  • Step 102 Determine the processor type corresponding to each operator according to the attribute information of each operator in the neural network calculation graph.
  • the type (type) of the processor may include a central processing unit (Central Processing Unit, CPU), an accelerated processor (Accelerated Processing Unit, APU), a graphics processor ( Graphics Processing Unit, GPU), tensor processor (Tensor Processing Unit, TPU), etc.
  • the types of processors in the embodiments of the present invention are not limited to this, and may also include other types of processors.
  • the calculation amount, operator function, and operator parameters of operators corresponding to various types of processors may be predetermined, and the calculation amount, operator function, operator parameter and processor type of the operator may be established. corresponding relationship, then in step 102, the processor type corresponding to each operator of the neural network calculation graph can be obtained from the predetermined corresponding relationship, and the processor type corresponding to the operator refers to the type of processor used to process the operator The type of processor.
  • the neural network calculation graph includes operator 1 and operator 2, wherein the attribute information of operator 1 includes one or more of the following combinations: calculation amount A1, operator function A2 and operator parameter A3, according to the above corresponding relationship,
  • the processor type corresponding to the calculation amount A1, the operator function A2, and the operator parameter A3 is the central processing unit, then the processor type corresponding to the operator 1 is the central processing unit;
  • the attribute information of the operator 2 includes one or more of the following Combination: calculation amount B1, operator function B2 and operator parameter B3, according to the above correspondence, the processor type corresponding to calculation amount B1, operator function B2 and operator parameter B3 is an accelerated processor, then the corresponding processor type of operator 2
  • the processor type is an accelerated processor.
  • Step 103 Divide the neural network calculation graph into one or more calculation subgraphs according to the processor type corresponding to each operator and the dependencies between multiple operators, and all operators in each calculation subgraph correspond to a kind of processor.
  • each of the divided calculation subgraphs may correspond to a type of processor.
  • the neural network calculation graph is divided into calculation subgraph 1 and calculation subgraph 2, wherein, in the calculation subgraph 1 All operators correspond to CPU, that is, all operators in calculation subgraph 1 can be processed by CPU, and all operators in calculation subgraph 2 correspond to APU, that is, all operators in calculation subgraph 2 can be processed by APU.
  • the number of calculation subgraphs into which the calculation graph is divided is not limited.
  • Step 104 Process the calculation subgraph by using a processor corresponding to the calculation subgraph.
  • the corresponding calculation subgraph is compiled by the processor corresponding to the calculation subgraph, or the corresponding calculation subgraph is compiled and executed by the processor corresponding to the calculation subgraph.
  • all operators in the calculation subgraph have the same processor type, that is, for each calculation subgraph, all operators in the calculation subgraph are performed by processors of the corresponding processor type. deal with.
  • all operators in each calculation subgraph correspond to one processor of one processor type, that is, each calculation subgraph is processed by one processor of the corresponding processor type, so that it can effectively Improve the processing efficiency of computational graphs.
  • the multiple calculation subgraphs may correspond to the same processor of one processor type, that is, the corresponding processor
  • the same processor of the processor type processes the multiple computation subgraphs. For example, if the processor type corresponding to the calculation subgraph 1 is a central processing unit, and the processor type corresponding to the calculation subgraph 2 is a central processing unit, the calculation subgraph 1 and the calculation subgraph 2 can be processed by the same central processing unit.
  • the multiple calculation subgraphs may correspond to different processors of the same processor type, that is, by multiple processors of the corresponding processor type
  • the plurality of computational subgraphs are processed. For example, if the processor type corresponding to the calculation sub-graph 1 is a central processing unit, and the processor type corresponding to the calculation sub-graph 2 is a central processing unit, the calculation sub-graph 1 can be processed by the same central processing unit 1, and the calculation sub-graph 2 It can be processed by the same central processing unit 2, and the central processing unit 1 and the central processing unit 2 are different central processing units under the same processor type.
  • the processor type corresponding to the operator can be determined according to the attribute information of the operator, and according to the processor type corresponding to the operator and the dependencies between multiple operators, the neural network calculation graph is divided into: One or more calculation subgraphs, each calculation subgraph corresponds to a processor, and the calculation subgraph is processed by the processor corresponding to the calculation subgraph, thereby realizing the effective use of computing resources and improving the neural network calculation.
  • the processing speed of the graph can effectively solve the problem that the processing speed of the neural network calculation graph is slow due to the inability of certain types of processors to process certain operators or the low processing efficiency.
  • FIG. 2 is a flowchart of a specific implementation manner of dividing a neural network calculation graph into one or more calculation subgraphs.
  • the specific steps of step 103 include steps 201 to 203 .
  • Step 201 Traverse each computing branch of the neural network computing graph along the first direction, where the computing branch includes a plurality of nodes, and each node corresponds to an operator.
  • the neural network computation graph may include multiple computation branches, each computation branch may include multiple nodes, and each node corresponds to an operator in the computation graph.
  • Step 202 For the current computing branch, traverse each node on the current computing branch along the second direction, and set the first direction and the second direction to intersect.
  • the second direction is the reverse direction of the processing order of each operator in the calculation diagram.
  • the processing order of each operator in the calculation diagram is the processing order from top to bottom, then the reverse direction is the order from bottom to top; the corresponding , the first direction may be a left-to-right direction or a right-to-left direction.
  • Step 203 Based on the traversal order, according to the processor type corresponding to each node, the dependency relationship between multiple nodes, and the preset division rule, divide the neural network computation graph into one or more computation subgraphs.
  • the dependencies between the multiple nodes may include: the hierarchical relationship of each operator in the neural network and/or the processing order of each operator.
  • the dependency relationship may include: parent-child node relationship, sibling node relationship, and the like.
  • parent nodes have children. Child nodes of the same level are called sibling (brother or sister) nodes.
  • the top node is called the root node. Except for the root node, every other node has a parent node.
  • the processing order of the above-mentioned nodes refers to the compilation order of each operator in the neural network computation graph, or the compilation and execution order.
  • the traversal order is the traversal order determined according to step 201 and step 202, that is, the order of traversing each computation branch of the computation graph along the first direction and the order of traversing each node of the current computation branch along the second direction, Based on the traversal order, each computing branch and each node of each computing branch are traversed in sequence to determine the computing subgraph to which each node belongs, thereby dividing one or more computing subgraphs.
  • the preset division rules may include: a first division rule.
  • the first division rule for the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located; If the corresponding processor types are different, the current node and the previous node are divided into different calculation subgraphs.
  • the current node is the first node of the first calculation branch in the traversal sequence, in the traversal sequence, the current node does not have the previous node, and the current node is directly divided into a calculation subgraph , and traverse the next node. If the current node is not the first node of the first calculation branch in the traversal order, then according to the traversal order, the dependency between the current node and the previous node of the current node is the parent-child relationship, the current node is the parent node, The previous node is a child node.
  • the nodes included in different calculation subgraphs are different, that is, the same nodes do not exist between different calculation subgraphs.
  • the preset division rule may further include: a second division rule.
  • Second division rule If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then the next node of the current node is not traversed node, and traverse each node of the next calculation branch; if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, Then continue to traverse the next node in the current calculation branch.
  • the dependency relationship between the current node and the next node of the current node is a parent-child node relationship
  • the next node is a parent node
  • the current node is a child node.
  • the preset division rule may further include: a third division rule.
  • the third division rule if the computing subgraph where the current node is located has multiple parent nodes in the neural network computing graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node, Then it is determined that any parent node and the current node belong to different computing subgraphs.
  • the neural network computation graph may be divided into one or more computation subgraphs based on the foregoing division rules.
  • the neural network computation graph can be divided into a computation subgraph, that is, Say, the computation subgraph is the neural network computation graph.
  • the neural network computation graph includes multiple branches, and the operator of the parent node and the operator of the child node on each branch have different processor types. Based on the above division rules, the neural network computation graph can be divided into Multiple computational subgraphs.
  • Fig. 3 is a schematic diagram of a neural network computation graph.
  • the neural network computation graph includes nodes N1 to N9, and the processing order of each node in the neural network computation graph is from top to bottom.
  • the dependency relationship is shown in FIG. 3 , and the second direction is the direction from bottom to top and the first direction is the direction from left to right as an example to illustrate and explain the preset division rules in the embodiment of the present invention. .
  • the processor type corresponding to the operator of node N7 is CPU
  • the processor type corresponding to the operators (N1 to N6, and N8 and N9) of other nodes is determined
  • APU all operators in the network are divided below based on the processor type corresponding to each operator, the dependencies between multiple operators, and the preset division rules.
  • the first calculation branch is the leftmost branch, that is, the branch formed by nodes N6, N5, N4, N3, and N2.
  • the first node of this branch in the traversal order is N6, Since N6 is the first node of the first calculation branch, the node N6 is divided into a calculation subgraph and continues to traverse the next node N5.
  • the node N5 is the parent node of the node N6, and the processing corresponding to the node N5 and the node N6
  • the types of the devices are the same.
  • the node N5 and the node N6 are divided into the same calculation subgraph, which is defined as the calculation subgraph A.
  • the node N4 is the parent node of N5, and the processor types corresponding to N4 and N5 are the same. According to the first division rule, the node N4 is divided to the calculation subgraph A.
  • the node N3 is the parent node of the node N4, and the processor type corresponding to the node N3 is the same as the processor type corresponding to the calculation subgraph A, According to the first division rule, the node N3 is divided into the calculation subgraph A.
  • the parent node of the node N3 of the first calculation branch is the node N2, the node N2 has multiple child nodes (nodes N3, N7, N9), and the node N3 is not the last child of the node N2 in the first direction node, the last child node of node N2 along the first direction is node N9, so according to the second division rule, for the first calculation branch, node N2 is not traversed, but each node of the next calculation branch is traversed, namely Traverse the branch formed by node N6, node N8, node N7, and node N2.
  • node N8 In the branch formed by node N6, node N8, node N7, and node N2, since node N6 has been traversed before, there is no need to repeat the traversal. According to the traversal order, node N8 is traversed.
  • the processor type corresponding to node N8 is the same as that of node N6.
  • the processor types are the same, that is, the processor types corresponding to the node N8 and the calculation subgraph A are the same, so the node N8 and the calculation subgraph A are divided according to the first division rule.
  • the traversal order continue to traverse the next node N7 of the node N8.
  • the processor type corresponding to the node N7 is different from the processor type corresponding to the node N8.
  • the node N7 is separately divided into a calculation subgraph, and define for the calculation of subgraph B.
  • the parent node of node N7 namely node N2
  • node N7 is not the last child node of node N2 along the first direction.
  • no traversal is performed. node N2, and continue to traverse each node of the next computing branch, that is, the branch formed by nodes N6, N8, N9, N2, and N1.
  • node N9 In the branch formed by node N6, node N8, node N9, node N2, and node N1, since node N6 and node N8 have been traversed before, there is no need to repeat the traversal. According to the traversal order, node N9 is traversed. Since the calculation subgraph A has The three parent nodes are node N2, node N7, and node N9. There are different types of processors corresponding to node N7 and calculation subgraph A. The processor type corresponding to node N7 is CPU, while the processor corresponding to calculation subgraph A The type is APU. According to the third division rule, any one of node N2, node N7, and node N9 should belong to different calculation subgraphs from calculation subgraph A.
  • processor type corresponding to node N9 is different from calculation subgraph A
  • the corresponding processor types are the same, but N9 is not divided into the calculation subgraph A, and the node N9 is separately divided into a calculation subgraph, which is defined as the calculation subgraph C.
  • the parent node of node N9 namely node N2
  • node N9 is the last child node of node N2 along the first direction. Therefore, according to the second division rule, traverse For the node N2, since the processor types corresponding to the node N2 and the node N9 are the same, the node N2 is divided into the calculation subgraph C according to the first division rule.
  • the root node N1 and the node N2 have the same processor type, and the node N1 is divided into the calculation subgraph C according to the first division rule.
  • calculation subgraph A the calculation subgraph B, and the calculation subgraph C are divided.
  • the calculation subgraph A and the calculation subgraph C have the same processor type, both of which are APUs, and the calculation subgraph B corresponds to the processor type. CPU.
  • computational subgraphs A and C may be processed by the same APU, while computational subgraph B may be processed by the CPU.
  • computational subgraph A may be processed by one APU
  • computational subgraph C may be processed by another APU
  • computational subgraph B may be processed by the CPU.
  • each calculation subgraph corresponds to a processor
  • each The processor type corresponding to each calculation subgraph is the processor type determined according to the attribute information corresponding to the operators in the calculation subgraph, which realizes the effective utilization of computing resources, improves the processing speed of the neural network calculation graph, and can effectively Solve the problem that the processing speed of the neural network calculation graph is slow due to the inability of certain types of processors to process certain operators or the low processing efficiency.
  • FIG. 4 is a flowchart of a specific implementation manner of processing a calculation subgraph by a processor corresponding to the calculation subgraph.
  • the calculation subgraph is processed by a processor corresponding to the calculation subgraph.
  • the step of processing the graph, that is, step 104 may further include steps 401 to 403 .
  • Step 401 According to the dependencies among the multiple operators in the neural network computation graph, determine the dependencies between the respective computation subgraphs.
  • Step 402 Obtain the processing sequence of each calculation subgraph according to the dependency relationship between each calculation subgraph.
  • the above-mentioned dependencies between the calculation subgraphs are used to represent the hierarchical relationship or processing order of the calculation subgraphs in the calculation graph. Based on the dependencies between the calculation subgraphs, the compilation sequence or the compilation and execution sequence of each calculation subgraph can be determined.
  • the calculation sub-graph A, the calculation sub-graph B and the calculation sub-graph C are divided according to the above-mentioned division method, wherein according to the calculation sub-graph C, the calculation sub-graph B, and the calculation sub-graph A between the operators It can be seen that the output of the calculation subgraph C is connected to the node N7 of the calculation subgraph B, and the nodes N3 and N8 of the calculation subgraph A, and the output of the calculation subgraph B is connected to the node N8 of the calculation subgraph A.
  • Step 403 According to the processing order of each calculation sub-graph, the calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.
  • multiple operators are divided into one or more calculation subgraphs according to their corresponding processor types and dependencies between operators, and each calculation
  • the sub-graph is processed, which can facilitate the configuration of the input and output relationship between the processors, ensure the accuracy of the operator operation in the neural network, and improve the efficiency of the operation processing.
  • FIG. 5 is a block diagram of a processing apparatus provided by an embodiment of the present invention.
  • an embodiment of the present invention provides an apparatus 500 for processing a neural network computation graph.
  • the processing apparatus 500 is used to process a neural network computation graph.
  • the apparatus 500 includes: an acquisition module 501 , a determination module 502 , and a division module 503 and processing module 504 .
  • the obtaining module 501 is used for obtaining attribute information of all operators in the neural network calculation graph.
  • the attribute information of the operator includes a combination of one or more of the following: calculation amount, operator function and operator parameter.
  • the determining module 502 is configured to determine the processor type corresponding to each operator according to the attribute information of each operator in the neural network calculation graph.
  • the division module 503 is used to divide the neural network calculation graph into one or more calculation subgraphs according to the processor type corresponding to each operator and the dependencies between multiple operators, and each calculation subgraph corresponds to a processor .
  • the processing module 504 is configured to process the corresponding calculation subgraph through the processor corresponding to the calculation subgraph.
  • all operators in the computation subgraph correspond to the same processor type. In some embodiments, a plurality of the computing subgraphs corresponding to the same processor type correspond to the same processor.
  • the dividing module 503 is further configured to: traverse each computation branch of the neural network computation graph along the first direction, where the computation branch includes a plurality of nodes, and each node corresponds to an operator; for the current computation branch, Traverse each node on the current computing branch along the second direction, and set the first direction and the second direction to intersect; The partitioning rule divides the neural network computation graph into one or more computation subgraphs.
  • the preset division rule includes: the preset division rule includes: a first division rule.
  • the first division rule includes: for the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located; If the processor types corresponding to the nodes are different, the current node and the previous node are divided into different computing subgraphs; wherein, the same nodes do not exist between different computing subgraphs.
  • the preset division rule further includes: a second division rule.
  • the second division rule includes: if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then traverse the next calculation branch Each node of the current node; if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, continue to traverse the nodes in the current calculation branch the next node.
  • the preset division rule further includes: a third division rule.
  • the third division rule includes: if the calculation subgraph where the current node is located has multiple parent nodes in the neural network calculation graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node , then it is determined that any parent node and the current node belong to different computational subgraphs.
  • the processing module 504 is further configured to: determine the dependency between the respective calculation subgraphs according to the dependency between the multiple operators in the neural network calculation graph; According to the dependency relationship, the processing order of each calculation subgraph is obtained; according to the processing sequence, the corresponding calculation subgraph is processed by the processor corresponding to the calculation subgraph.
  • the processing apparatus 500 for processing a neural network computation graph provided by the embodiment of the present invention can implement the various steps and processes of the above-mentioned processing method, and can achieve the same technical effect.
  • FIG. 6 is a block diagram of the composition of a processing device according to an embodiment of the present invention.
  • an embodiment of the present invention further provides a processing device 600, the processing device 600 includes a processor 601, a memory 602, and a program or instruction stored in the memory 602 and executable on the processor 601, the When the program or instruction is executed by the processor 601, each step and process of the above-mentioned processing method is implemented, and the same technical effect can be achieved.
  • An embodiment of the present invention further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, each step and process of the above processing method can be implemented, and the same technology can be achieved. Effects, in order to avoid repeated descriptions, are not repeated here.
  • the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof.
  • the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
  • Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • a storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.
  • embodiments of the present invention may be provided as a method, a system, or a computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product implemented on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.
  • Embodiments of the present invention are described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un appareil de traitement de graphe de calcul de réseau de neurones artificiels, un dispositif de traitement et un support de stockage lisible. Le procédé de traitement consiste à : déterminer, en fonction d'informations d'attribut de chaque opérateur dans un graphe de calcul de réseau de neurones artificiels, un type d'unité de traitement correspondant à chaque opérateur ; diviser le graphe de calcul de réseau de neurones artificiels en un ou plusieurs sous-graphes de calcul selon le type d'unité de traitement correspondant à chaque opérateur et une relation de dépendance entre une pluralité d'opérateurs, chaque sous-graphe de calcul correspondant à une unité de traitement ; et traiter les sous-graphes de calcul au moyen d'unités de traitement correspondant aux sous-graphes de calcul.
PCT/CN2021/109499 2020-07-31 2021-07-30 Procédé et appareil de traitement de graphe de calcul de réseau de neurones artificiels, et dispositif de traitement WO2022022670A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010757561.1 2020-07-31
CN202010757561.1A CN111860820A (zh) 2020-07-31 2020-07-31 神经网络算子的划分方法、装置及划分设备

Publications (1)

Publication Number Publication Date
WO2022022670A1 true WO2022022670A1 (fr) 2022-02-03

Family

ID=72953458

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109499 WO2022022670A1 (fr) 2020-07-31 2021-07-30 Procédé et appareil de traitement de graphe de calcul de réseau de neurones artificiels, et dispositif de traitement

Country Status (2)

Country Link
CN (1) CN111860820A (fr)
WO (1) WO2022022670A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691330A (zh) * 2022-03-28 2022-07-01 北京百度网讯科技有限公司 数据处理方法、装置、电子设备以及存储介质
CN115268877A (zh) * 2022-09-27 2022-11-01 之江实验室 一种用于图计算并行执行的中间表示方法及装置
CN115796228A (zh) * 2022-11-15 2023-03-14 北京百度网讯科技有限公司 算子融合方法、装置、设备以及存储介质
US11782723B1 (en) 2022-09-27 2023-10-10 Zhejiang Lab Intermediate representation method and apparatus for parallel execution of graph computation
WO2023207361A1 (fr) * 2022-04-26 2023-11-02 山东云海国创云计算装备产业创新中心有限公司 Procédé de gestion de mémoire, système, dispositif et support de stockage lisible par ordinateur
CN117576125A (zh) * 2024-01-16 2024-02-20 芯瞳半导体技术(山东)有限公司 一种神经网络计算图的分割方法、装置、设备及存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860820A (zh) * 2020-07-31 2020-10-30 北京灵汐科技有限公司 神经网络算子的划分方法、装置及划分设备
CN114626284A (zh) * 2020-12-14 2022-06-14 华为技术有限公司 一种模型处理方法及相关装置
CN112947933A (zh) * 2021-02-24 2021-06-11 上海商汤智能科技有限公司 一种算子的执行方法、装置、计算机设备及存储介质
CN113051080A (zh) * 2021-04-22 2021-06-29 杭州海康威视数字技术股份有限公司 一种计算图执行方法、装置及异构平台
CN116362305A (zh) * 2021-12-24 2023-06-30 Oppo广东移动通信有限公司 数据处理方法、装置、计算机设备及存储介质
CN116432736A (zh) * 2021-12-31 2023-07-14 华为技术有限公司 神经网络模型优化方法、装置及计算设备
CN114819084B (zh) * 2022-04-26 2024-03-01 北京百度网讯科技有限公司 模型推理方法、装置、设备及存储介质
CN117521841A (zh) * 2022-07-28 2024-02-06 华为技术有限公司 一种深度学习系统和方法
CN115358379B (zh) * 2022-10-20 2023-01-10 腾讯科技(深圳)有限公司 神经网络处理、信息处理方法、装置和计算机设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108292241A (zh) * 2015-10-28 2018-07-17 谷歌有限责任公司 处理计算图
CN109754073A (zh) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 数据处理方法、装置、电子设备和可读存储介质
CN110689115A (zh) * 2019-09-24 2020-01-14 上海寒武纪信息科技有限公司 神经网络模型处理方法、装置、计算机设备及存储介质
CN111062467A (zh) * 2019-12-18 2020-04-24 开放智能机器(上海)有限公司 一种应用于ai异构编译器的自动切分神经网络子图方法
CN111860820A (zh) * 2020-07-31 2020-10-30 北京灵汐科技有限公司 神经网络算子的划分方法、装置及划分设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108292241A (zh) * 2015-10-28 2018-07-17 谷歌有限责任公司 处理计算图
CN109754073A (zh) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 数据处理方法、装置、电子设备和可读存储介质
CN110689115A (zh) * 2019-09-24 2020-01-14 上海寒武纪信息科技有限公司 神经网络模型处理方法、装置、计算机设备及存储介质
CN111062467A (zh) * 2019-12-18 2020-04-24 开放智能机器(上海)有限公司 一种应用于ai异构编译器的自动切分神经网络子图方法
CN111860820A (zh) * 2020-07-31 2020-10-30 北京灵汐科技有限公司 神经网络算子的划分方法、装置及划分设备

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691330A (zh) * 2022-03-28 2022-07-01 北京百度网讯科技有限公司 数据处理方法、装置、电子设备以及存储介质
WO2023207361A1 (fr) * 2022-04-26 2023-11-02 山东云海国创云计算装备产业创新中心有限公司 Procédé de gestion de mémoire, système, dispositif et support de stockage lisible par ordinateur
CN115268877A (zh) * 2022-09-27 2022-11-01 之江实验室 一种用于图计算并行执行的中间表示方法及装置
US11782723B1 (en) 2022-09-27 2023-10-10 Zhejiang Lab Intermediate representation method and apparatus for parallel execution of graph computation
CN115796228A (zh) * 2022-11-15 2023-03-14 北京百度网讯科技有限公司 算子融合方法、装置、设备以及存储介质
CN115796228B (zh) * 2022-11-15 2024-04-05 北京百度网讯科技有限公司 算子融合方法、装置、设备以及存储介质
CN117576125A (zh) * 2024-01-16 2024-02-20 芯瞳半导体技术(山东)有限公司 一种神经网络计算图的分割方法、装置、设备及存储介质
CN117576125B (zh) * 2024-01-16 2024-04-16 芯瞳半导体技术(山东)有限公司 一种神经网络计算图的分割方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN111860820A (zh) 2020-10-30

Similar Documents

Publication Publication Date Title
WO2022022670A1 (fr) Procédé et appareil de traitement de graphe de calcul de réseau de neurones artificiels, et dispositif de traitement
JP6908682B2 (ja) グラフに基づくプログラムの仕様の実行
Karloff et al. A model of computation for MapReduce
US20160342395A1 (en) Visual program specification and compilation of graph-based computation
US20140165049A1 (en) Compiler-controlled region scheduling for simd execution of threads
KR20170042801A (ko) 사용자 상호작용을 통한 그래프 기반 프로그램에서 구성요소 서브세트의 시각적 명시
JP6778193B2 (ja) データ処理タスクの制御
JP2020013608A (ja) データ処理グラフのコンパイル
WO2024021192A1 (fr) Procédé et appareil d'optimisation de graphe pour calcul de réseau neuronal
US11630983B2 (en) Graph conversion method
CN106293947B (zh) 虚拟化云环境下gpu-cpu混合资源分配系统和方法
WO2022087788A1 (fr) Procédé d'optimisation de compilation de réseau neuronal, et appareil associé
CN115033391A (zh) 一种用于神经网络计算的数据流动方法和装置
Saleh et al. Partitioning for scalable complex event processing on data streams
Sahni et al. Heterogeneity-aware elastic scaling of streaming applications on cloud platforms
US20240095241A1 (en) Data search method and apparatus, and device
Khasanov et al. Implicit data-parallelism in Kahn process networks: Bridging the MacQueen Gap
US20220253482A1 (en) Memory bandwidth allocation for multi-tenant fpga cloud infrastructures
Kienberger et al. Analysis and validation of AUTOSAR models
CN113326137B (zh) 深度学习计算方法、装置、芯片及介质
Shivashankar et al. Efficient software for programmable visual analysis using Morse-Smale complexes
CN115374914A (zh) 分布式训练方法、并行深度学习框架及电子设备
Legaux et al. Development effort and performance trade-off in high-level parallel programming
CN112000478B (zh) 作业运行资源分配方法及装置
CN110969565A (zh) 图像处理的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21849434

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.05.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21849434

Country of ref document: EP

Kind code of ref document: A1