WO2022022670A1

WO2022022670A1 - Neural network computation graph processing method and apparatus, and processing device

Info

Publication number: WO2022022670A1
Application number: PCT/CN2021/109499
Authority: WO
Inventors: 戚海涛; 李涵; 吴欣洋; 丁瑞强; 祝夭龙
Original assignee: 北京灵汐科技有限公司
Priority date: 2020-07-31
Filing date: 2021-07-30
Publication date: 2022-02-03
Also published as: CN111860820A

Abstract

Provided are a neural network computation graph processing method, a processing apparatus, a processing device and a readable storage medium. The processing method comprises: determining, according to attribute information of each operator in a neural network computation graph, a processing unit type corresponding to each operator; dividing the neural network computation graph into one or more computation sub-graphs according to the processing unit type corresponding to each operator and a dependency relationship between a plurality of operators, wherein each computation sub-graph corresponds to a processing unit; and processing the computation sub-graphs by means of processing units corresponding to the computation sub-graphs.

Description

Method, device and processing device for processing neural network computing graph

technical field

The present invention relates to the technical field of network communication, in particular to a method for processing a neural network computation graph, a processing device, a processing device and a readable storage medium.

Background technique

The neural network computation graph consists of a variety of operators, which can be regarded as operational functions, such as convolution and pooling.

Although the Central Processing Unit (CPU) can run all the operators, the convolution and full connections (1024×4096) require a large amount of computation, so running on the CPU will be very slow and inefficient. At this time, you can choose to run with a dedicated accelerated processor (Accelerated Processing Unit, APU). However, the APU often supports limited instructions, and some special operators cannot be run by the APU.

Therefore, how to allocate the processors for processing the operators of the neural network is a technical problem that needs to be solved urgently.

SUMMARY OF THE INVENTION

The present invention provides a method for processing a neural network computation graph, a processing device, a processing device and a readable storage medium.

A first aspect provides a method for processing a neural network computation graph, the processing method comprising: determining a processor type corresponding to each operator according to attribute information of each operator in the neural network computation graph; The dependency relationship between the processor type corresponding to the sub-graph and a plurality of the operators, the neural network calculation graph is divided into one or more calculation sub-graphs, and all operators in each of the calculation sub-graphs correspond to a A processor; the calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.

In a second aspect, a processing device is provided, the processing device comprising: a determination module for determining a processor type corresponding to each operator according to attribute information of each operator in a neural network calculation graph; a division module for The neural network computation graph is divided into one or more computation subgraphs according to the processor type corresponding to each operator and the dependencies among the plurality of operators, each of the computation subgraphs corresponds to a A processor; a processing module, configured to process the calculation sub-graph through the processor corresponding to the calculation sub-graph.

In a third aspect, a processing device is provided, the processing device comprising: a processor, a memory, and a program stored on the memory and executable on the processor, the program when executed by the processor implements the following The steps of the method for processing a neural network computation graph described in the first aspect.

In a fourth aspect, a readable storage medium is provided, where a program is stored on the readable storage medium, and when the program is executed by a processor, the steps of the method for processing a neural network computation graph according to the first aspect are implemented.

In the technical solution of the neural network computation graph processing method, processing device, processing device and readable storage medium provided by the present disclosure, the processor type corresponding to the operator is determined according to the attribute information of the operator, and the processing corresponding to the operator is determined according to the attribute information of the operator. The dependency relationship between the operator type and multiple operators, the neural network calculation graph is divided into one or more calculation subgraphs, each calculation subgraph corresponds to a kind of processor, and the processor corresponding to the calculation subgraph is used for the calculation. The sub-graph is processed, so as to realize the effective use of computing resources, improve the processing speed of the neural network computing graph, and effectively solve the problem of neural network computing due to the inability of certain types of processors to process certain operators or the low processing efficiency. The problem of slow processing of graphs.

Description of drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be considered limiting of the invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:

FIG. 1 is a flowchart of a method for processing a neural network computation graph according to an embodiment of the present invention.

FIG. 2 is a flowchart of a specific implementation manner of dividing a neural network computation graph into one or more computation subgraphs.

FIG. 3 is a schematic diagram of a neural network computation graph.

Fig. 4 is a flow chart of a specific implementation manner of processing a calculation subgraph by a processor corresponding to the calculation subgraph.

FIG. 5 is a block diagram of a processing apparatus provided by an embodiment of the present invention.

FIG. 6 is a block diagram of the composition of a processing device according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

The term "comprising" and any variations thereof in the description and claims of the present invention are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to the explicit Those steps or units are explicitly listed, but may include other steps or units not expressly listed or inherent to the process, method, product or apparatus. In addition, the use of "and/or" in the description and the claims indicates at least one of the connected objects, such as A and/or B, indicating that there are three cases including A alone, B alone, and both A and B.

In the embodiments of the present invention, words such as "exemplary" or "for example" are used to mean serving as an example, illustration or illustration. Any embodiments or designs described as "exemplary" or "such as" in the embodiments of the present invention should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.

Referring to FIG. 1 , an embodiment of the present invention provides a method for processing a neural network computation graph. The specific steps include: step 101 , step 102 , step 103 and step 104 .

Step 101: Obtain attribute information of all operators in the neural network computational graph (Computational Graph).

It can be understood that a neural network can be a computing network composed of operators (Operator, OP). The neural network computing graph can be composed of multiple operators, and operators can be regarded as operation functions, such as convolution and pooling. (Pooling), full connection, etc. Optionally, the neural network is a tree-structured neural network model. It can be understood that the embodiment of the present invention does not limit the specific implementation form of the neural network model.

In this embodiment of the present invention, each operator has a different operation process and also has one or more attributes, and the attribute information of the operator may include a combination of one or more of the following: calculation amount, operator function, and operator parameter.

Step 102: Determine the processor type corresponding to each operator according to the attribute information of each operator in the neural network calculation graph.

In this embodiment of the present invention, the type (type) of the processor (also referred to as a processing unit) may include a central processing unit (Central Processing Unit, CPU), an accelerated processor (Accelerated Processing Unit, APU), a graphics processor ( Graphics Processing Unit, GPU), tensor processor (Tensor Processing Unit, TPU), etc. The types of processors in the embodiments of the present invention are not limited to this, and may also include other types of processors.

In some embodiments, the calculation amount, operator function, and operator parameters of operators corresponding to various types of processors may be predetermined, and the calculation amount, operator function, operator parameter and processor type of the operator may be established. corresponding relationship, then in step 102, the processor type corresponding to each operator of the neural network calculation graph can be obtained from the predetermined corresponding relationship, and the processor type corresponding to the operator refers to the type of processor used to process the operator The type of processor.

For example, the neural network calculation graph includes operator 1 and operator 2, wherein the attribute information of operator 1 includes one or more of the following combinations: calculation amount A1, operator function A2 and operator parameter A3, according to the above corresponding relationship, The processor type corresponding to the calculation amount A1, the operator function A2, and the operator parameter A3 is the central processing unit, then the processor type corresponding to the operator 1 is the central processing unit; the attribute information of the operator 2 includes one or more of the following Combination: calculation amount B1, operator function B2 and operator parameter B3, according to the above correspondence, the processor type corresponding to calculation amount B1, operator function B2 and operator parameter B3 is an accelerated processor, then the corresponding processor type of operator 2 The processor type is an accelerated processor.

Step 103: Divide the neural network calculation graph into one or more calculation subgraphs according to the processor type corresponding to each operator and the dependencies between multiple operators, and all operators in each calculation subgraph correspond to a kind of processor.

In this embodiment of the present invention, each of the divided calculation subgraphs may correspond to a type of processor. For example, the neural network calculation graph is divided into calculation subgraph 1 and calculation subgraph 2, wherein, in the calculation subgraph 1 All operators correspond to CPU, that is, all operators in calculation subgraph 1 can be processed by CPU, and all operators in calculation subgraph 2 correspond to APU, that is, all operators in calculation subgraph 2 can be processed by APU.

It can be understood that, in this embodiment of the present invention, the number of calculation subgraphs into which the calculation graph is divided is not limited.

Step 104: Process the calculation subgraph by using a processor corresponding to the calculation subgraph.

Specifically, the corresponding calculation subgraph is compiled by the processor corresponding to the calculation subgraph, or the corresponding calculation subgraph is compiled and executed by the processor corresponding to the calculation subgraph.

In this embodiment of the present invention, all operators in the calculation subgraph have the same processor type, that is, for each calculation subgraph, all operators in the calculation subgraph are performed by processors of the corresponding processor type. deal with.

In some embodiments, all operators in each calculation subgraph correspond to one processor of one processor type, that is, each calculation subgraph is processed by one processor of the corresponding processor type, so that it can effectively Improve the processing efficiency of computational graphs.

In some embodiments, in order to save computing resources, when there are multiple calculation subgraphs corresponding to the same processor type, the multiple calculation subgraphs may correspond to the same processor of one processor type, that is, the corresponding processor The same processor of the processor type processes the multiple computation subgraphs. For example, if the processor type corresponding to the calculation subgraph 1 is a central processing unit, and the processor type corresponding to the calculation subgraph 2 is a central processing unit, the calculation subgraph 1 and the calculation subgraph 2 can be processed by the same central processing unit.

In some embodiments, when there are multiple calculation subgraphs corresponding to the same processor type, the multiple calculation subgraphs may correspond to different processors of the same processor type, that is, by multiple processors of the corresponding processor type The plurality of computational subgraphs are processed. For example, if the processor type corresponding to the calculation sub-graph 1 is a central processing unit, and the processor type corresponding to the calculation sub-graph 2 is a central processing unit, the calculation sub-graph 1 can be processed by the same central processing unit 1, and the calculation sub-graph 2 It can be processed by the same central processing unit 2, and the central processing unit 1 and the central processing unit 2 are different central processing units under the same processor type.

In this embodiment of the present invention, the processor type corresponding to the operator can be determined according to the attribute information of the operator, and according to the processor type corresponding to the operator and the dependencies between multiple operators, the neural network calculation graph is divided into: One or more calculation subgraphs, each calculation subgraph corresponds to a processor, and the calculation subgraph is processed by the processor corresponding to the calculation subgraph, thereby realizing the effective use of computing resources and improving the neural network calculation. The processing speed of the graph can effectively solve the problem that the processing speed of the neural network calculation graph is slow due to the inability of certain types of processors to process certain operators or the low processing efficiency.

FIG. 2 is a flowchart of a specific implementation manner of dividing a neural network calculation graph into one or more calculation subgraphs. Referring to FIG. 2 , in some embodiments, the specific steps of step 103 include steps 201 to 203 .

Step 201 : Traverse each computing branch of the neural network computing graph along the first direction, where the computing branch includes a plurality of nodes, and each node corresponds to an operator.

In some embodiments, the neural network computation graph may include multiple computation branches, each computation branch may include multiple nodes, and each node corresponds to an operator in the computation graph.

Step 202: For the current computing branch, traverse each node on the current computing branch along the second direction, and set the first direction and the second direction to intersect.

Among them, the second direction is the reverse direction of the processing order of each operator in the calculation diagram. For example, the processing order of each operator in the calculation diagram is the processing order from top to bottom, then the reverse direction is the order from bottom to top; the corresponding , the first direction may be a left-to-right direction or a right-to-left direction.

Step 203 : Based on the traversal order, according to the processor type corresponding to each node, the dependency relationship between multiple nodes, and the preset division rule, divide the neural network computation graph into one or more computation subgraphs.

The dependencies between the multiple nodes (operators) may include: the hierarchical relationship of each operator in the neural network and/or the processing order of each operator.

Taking the neural network as a tree neural network as an example, the dependency relationship may include: parent-child node relationship, sibling node relationship, and the like.

Understandably, in a node tree, parent nodes have children. Child nodes of the same level are called sibling (brother or sister) nodes. The top node is called the root node. Except for the root node, every other node has a parent node.

In the embodiment of the present invention, the processing order of the above-mentioned nodes (operators) refers to the compilation order of each operator in the neural network computation graph, or the compilation and execution order.

In step 203, the traversal order is the traversal order determined according to step 201 and step 202, that is, the order of traversing each computation branch of the computation graph along the first direction and the order of traversing each node of the current computation branch along the second direction, Based on the traversal order, each computing branch and each node of each computing branch are traversed in sequence to determine the computing subgraph to which each node belongs, thereby dividing one or more computing subgraphs.

In some embodiments, the preset division rules may include: a first division rule.

The first division rule: for the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located; If the corresponding processor types are different, the current node and the previous node are divided into different calculation subgraphs.

Among them, it can be understood that if the current node is the first node of the first calculation branch in the traversal sequence, in the traversal sequence, the current node does not have the previous node, and the current node is directly divided into a calculation subgraph , and traverse the next node. If the current node is not the first node of the first calculation branch in the traversal order, then according to the traversal order, the dependency between the current node and the previous node of the current node is the parent-child relationship, the current node is the parent node, The previous node is a child node.

In the embodiment of the present invention, the nodes included in different calculation subgraphs are different, that is, the same nodes do not exist between different calculation subgraphs.

In some embodiments, the preset division rule may further include: a second division rule.

Second division rule: If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then the next node of the current node is not traversed node, and traverse each node of the next calculation branch; if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, Then continue to traverse the next node in the current calculation branch.

It can be understood that, according to the traversal order, the dependency relationship between the current node and the next node of the current node is a parent-child node relationship, the next node is a parent node, and the current node is a child node.

In some embodiments, the preset division rule may further include: a third division rule.

The third division rule: if the computing subgraph where the current node is located has multiple parent nodes in the neural network computing graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node, Then it is determined that any parent node and the current node belong to different computing subgraphs.

It can be understood that, in this embodiment of the present invention, the neural network computation graph may be divided into one or more computation subgraphs based on the foregoing division rules.

For example, if the neural network computation graph includes multiple branches, and the operator of the parent node and the operator of the child node on each branch have the same processor type, the neural network computation graph can be divided into a computation subgraph, that is, Say, the computation subgraph is the neural network computation graph. For example, the neural network computation graph includes multiple branches, and the operator of the parent node and the operator of the child node on each branch have different processor types. Based on the above division rules, the neural network computation graph can be divided into Multiple computational subgraphs.

Fig. 3 is a schematic diagram of a neural network computation graph. As an example, the neural network computation graph includes nodes N1 to N9, and the processing order of each node in the neural network computation graph is from top to bottom. The dependency relationship is shown in FIG. 3 , and the second direction is the direction from bottom to top and the first direction is the direction from left to right as an example to illustrate and explain the preset division rules in the embodiment of the present invention. .

Referring to FIG. 3, as an example, it is assumed that according to

steps

101 and 102, it is determined that the processor type corresponding to the operator of node N7 is CPU, and the processor type corresponding to the operators (N1 to N6, and N8 and N9) of other nodes is determined For APU, all operators in the network are divided below based on the processor type corresponding to each operator, the dependencies between multiple operators, and the preset division rules.

Based on the traversal order, the first calculation branch is the leftmost branch, that is, the branch formed by nodes N6, N5, N4, N3, and N2. The first node of this branch in the traversal order is N6, Since N6 is the first node of the first calculation branch, the node N6 is divided into a calculation subgraph and continues to traverse the next node N5. The node N5 is the parent node of the node N6, and the processing corresponding to the node N5 and the node N6 The types of the devices are the same. Based on the first division rule, the node N5 and the node N6 are divided into the same calculation subgraph, which is defined as the calculation subgraph A.

According to the traversal order, continue to traverse the next node N4 of the node N5 of the first calculation branch. The node N4 is the parent node of N5, and the processor types corresponding to N4 and N5 are the same. According to the first division rule, the node N4 is divided to the calculation subgraph A.

According to the traversal order, continue to traverse the next node N3 of the node N4 of the first calculation branch, the node N3 is the parent node of the node N4, and the processor type corresponding to the node N3 is the same as the processor type corresponding to the calculation subgraph A, According to the first division rule, the node N3 is divided into the calculation subgraph A.

According to the traversal order, the parent node of the node N3 of the first calculation branch is the node N2, the node N2 has multiple child nodes (nodes N3, N7, N9), and the node N3 is not the last child of the node N2 in the first direction node, the last child node of node N2 along the first direction is node N9, so according to the second division rule, for the first calculation branch, node N2 is not traversed, but each node of the next calculation branch is traversed, namely Traverse the branch formed by node N6, node N8, node N7, and node N2.

In the branch formed by node N6, node N8, node N7, and node N2, since node N6 has been traversed before, there is no need to repeat the traversal. According to the traversal order, node N8 is traversed. The processor type corresponding to node N8 is the same as that of node N6. The processor types are the same, that is, the processor types corresponding to the node N8 and the calculation subgraph A are the same, so the node N8 and the calculation subgraph A are divided according to the first division rule.

According to the traversal order, continue to traverse the next node N7 of the node N8. The processor type corresponding to the node N7 is different from the processor type corresponding to the node N8. According to the first division rule, the node N7 is separately divided into a calculation subgraph, and define for the calculation of subgraph B.

According to the traversal order, the parent node of node N7, namely node N2, has multiple child nodes (nodes N3, N7, N9), and node N7 is not the last child node of node N2 along the first direction. According to the second division rule, no traversal is performed. node N2, and continue to traverse each node of the next computing branch, that is, the branch formed by nodes N6, N8, N9, N2, and N1.

In the branch formed by node N6, node N8, node N9, node N2, and node N1, since node N6 and node N8 have been traversed before, there is no need to repeat the traversal. According to the traversal order, node N9 is traversed. Since the calculation subgraph A has The three parent nodes are node N2, node N7, and node N9. There are different types of processors corresponding to node N7 and calculation subgraph A. The processor type corresponding to node N7 is CPU, while the processor corresponding to calculation subgraph A The type is APU. According to the third division rule, any one of node N2, node N7, and node N9 should belong to different calculation subgraphs from calculation subgraph A. Therefore, although the processor type corresponding to node N9 is different from calculation subgraph A The corresponding processor types are the same, but N9 is not divided into the calculation subgraph A, and the node N9 is separately divided into a calculation subgraph, which is defined as the calculation subgraph C.

According to the traversal order, the parent node of node N9, namely node N2, has multiple child nodes (nodes N3, N7, N9), and node N9 is the last child node of node N2 along the first direction. Therefore, according to the second division rule, traverse For the node N2, since the processor types corresponding to the node N2 and the node N9 are the same, the node N2 is divided into the calculation subgraph C according to the first division rule.

According to the traversal order, the root node N1 and the node N2 have the same processor type, and the node N1 is divided into the calculation subgraph C according to the first division rule.

So far, the calculation subgraph A, the calculation subgraph B, and the calculation subgraph C are divided. The calculation subgraph A and the calculation subgraph C have the same processor type, both of which are APUs, and the calculation subgraph B corresponds to the processor type. CPU.

In some embodiments, computational subgraphs A and C may be processed by the same APU, while computational subgraph B may be processed by the CPU. In some embodiments, computational subgraph A may be processed by one APU, computational subgraph C may be processed by another APU, and computational subgraph B may be processed by the CPU.

The division of calculation subgraphs according to the above division rules can facilitate the compilation and operation of calculation graphs according to the processing order, and facilitate the configuration of input and output relationships between processors. At the same time, each calculation subgraph corresponds to a processor, and each The processor type corresponding to each calculation subgraph is the processor type determined according to the attribute information corresponding to the operators in the calculation subgraph, which realizes the effective utilization of computing resources, improves the processing speed of the neural network calculation graph, and can effectively Solve the problem that the processing speed of the neural network calculation graph is slow due to the inability of certain types of processors to process certain operators or the low processing efficiency.

FIG. 4 is a flowchart of a specific implementation manner of processing a calculation subgraph by a processor corresponding to the calculation subgraph. In some embodiments, as shown in FIG. 4 , the calculation subgraph is processed by a processor corresponding to the calculation subgraph. The step of processing the graph, that is, step 104 may further include steps 401 to 403 .

Step 401 : According to the dependencies among the multiple operators in the neural network computation graph, determine the dependencies between the respective computation subgraphs.

Step 402: Obtain the processing sequence of each calculation subgraph according to the dependency relationship between each calculation subgraph.

The above-mentioned dependencies between the calculation subgraphs are used to represent the hierarchical relationship or processing order of the calculation subgraphs in the calculation graph. Based on the dependencies between the calculation subgraphs, the compilation sequence or the compilation and execution sequence of each calculation subgraph can be determined.

For example, as shown in FIG. 3 , the calculation sub-graph A, the calculation sub-graph B and the calculation sub-graph C are divided according to the above-mentioned division method, wherein according to the calculation sub-graph C, the calculation sub-graph B, and the calculation sub-graph A between the operators It can be seen that the output of the calculation subgraph C is connected to the node N7 of the calculation subgraph B, and the nodes N3 and N8 of the calculation subgraph A, and the output of the calculation subgraph B is connected to the node N8 of the calculation subgraph A.

Step 403 : According to the processing order of each calculation sub-graph, the calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.

In this embodiment of the present invention, multiple operators are divided into one or more calculation subgraphs according to their corresponding processor types and dependencies between operators, and each calculation The sub-graph is processed, which can facilitate the configuration of the input and output relationship between the processors, ensure the accuracy of the operator operation in the neural network, and improve the efficiency of the operation processing.

Referring to FIG. 5 , an embodiment of the present invention provides an apparatus 500 for processing a neural network computation graph. The processing apparatus 500 is used to process a neural network computation graph. The apparatus 500 includes: an acquisition module 501 , a determination module 502 , and a division module 503 and processing module 504 .

Wherein, the obtaining module 501 is used for obtaining attribute information of all operators in the neural network calculation graph.

In some embodiments, the attribute information of the operator includes a combination of one or more of the following: calculation amount, operator function and operator parameter.

The determining module 502 is configured to determine the processor type corresponding to each operator according to the attribute information of each operator in the neural network calculation graph. The division module 503 is used to divide the neural network calculation graph into one or more calculation subgraphs according to the processor type corresponding to each operator and the dependencies between multiple operators, and each calculation subgraph corresponds to a processor . The processing module 504 is configured to process the corresponding calculation subgraph through the processor corresponding to the calculation subgraph.

In some embodiments, all operators in the computation subgraph correspond to the same processor type. In some embodiments, a plurality of the computing subgraphs corresponding to the same processor type correspond to the same processor.

In some embodiments, the dividing module 503 is further configured to: traverse each computation branch of the neural network computation graph along the first direction, where the computation branch includes a plurality of nodes, and each node corresponds to an operator; for the current computation branch, Traverse each node on the current computing branch along the second direction, and set the first direction and the second direction to intersect; The partitioning rule divides the neural network computation graph into one or more computation subgraphs.

In some embodiments, the preset division rule includes: the preset division rule includes: a first division rule.

The first division rule includes: for the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located; If the processor types corresponding to the nodes are different, the current node and the previous node are divided into different computing subgraphs; wherein, the same nodes do not exist between different computing subgraphs.

In some embodiments, the preset division rule further includes: a second division rule.

The second division rule includes: if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then traverse the next calculation branch Each node of the current node; if the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, continue to traverse the nodes in the current calculation branch the next node.

In some embodiments, the preset division rule further includes: a third division rule.

The third division rule includes: if the calculation subgraph where the current node is located has multiple parent nodes in the neural network calculation graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node , then it is determined that any parent node and the current node belong to different computational subgraphs.

In some implementations, the processing module 504 is further configured to: determine the dependency between the respective calculation subgraphs according to the dependency between the multiple operators in the neural network calculation graph; According to the dependency relationship, the processing order of each calculation subgraph is obtained; according to the processing sequence, the corresponding calculation subgraph is processed by the processor corresponding to the calculation subgraph.

The processing apparatus 500 for processing a neural network computation graph provided by the embodiment of the present invention can implement the various steps and processes of the above-mentioned processing method, and can achieve the same technical effect.

As shown in FIG. 6, an embodiment of the present invention further provides a processing device 600, the processing device 600 includes a processor 601, a memory 602, and a program or instruction stored in the memory 602 and executable on the processor 601, the When the program or instruction is executed by the processor 601, each step and process of the above-mentioned processing method is implemented, and the same technical effect can be achieved.

An embodiment of the present invention further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, each step and process of the above processing method can be implemented, and the same technology can be achieved. Effects, in order to avoid repeated descriptions, are not repeated here.

Those skilled in the art should appreciate that, in one or more of the above examples, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.

The specific embodiments described above further describe the objectives, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included within the protection scope of the present invention.

It should be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, a system, or a computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product implemented on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.

Embodiments of the present invention are described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present invention without departing from the spirit and scope of the present invention. Thus, provided that these modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

A method for processing a neural network computation graph, comprising:

Determine the processor type corresponding to each operator according to the attribute information of each operator in the neural network calculation graph;

The neural network computation graph is divided into one or more computation subgraphs according to the processor type corresponding to each of the operators and the dependencies among the plurality of the operators. All operators correspond to one processor;

The calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.
The method according to claim 1, wherein all operators in each of the calculation subgraphs correspond to one processor of one processor type.
The method according to claim 1, wherein a plurality of the calculation subgraphs correspond to the same processor of one processor type.
The method according to claim 1, characterized in that, according to the processor type corresponding to each operator and the dependency relationship between a plurality of the operators, the neural network calculation graph is divided into one or more Multiple steps to compute subgraphs, including:

Traverse each computing branch of the neural network computing graph along the first direction, where the computing branch includes a plurality of nodes, and each node corresponds to one of the operators;

For the current computing branch, traverse each of the nodes on the current computing branch along a second direction, and the first direction and the second direction are set intersecting;

Based on the traversal order, the neural network computation graph is divided into one or more computation subgraphs according to the processor type corresponding to each of the nodes, the dependencies among a plurality of the nodes, and a preset division rule.
The method according to claim 4, wherein the preset division rule comprises:

For the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located;

If the processor types corresponding to the current node and the previous node are different, divide the current node and the previous node into different computing subgraphs;

Wherein, the same node does not exist between different calculation subgraphs.
The method according to claim 5, wherein the preset division rule further comprises:

If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then traverse each node of the next calculation branch;

If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, continue to traverse the next node in the current calculation branch a node.
The method according to claim 5 or 6, wherein the preset division rule further comprises:

If the calculation subgraph where the current node is located has multiple parent nodes in the neural network calculation graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node, it is determined that any - the parent node and the current node belong to different computing subgraphs.
The method according to claim 1, wherein the step of processing the calculation subgraph by a processor corresponding to the calculation subgraph comprises:

According to the dependencies between a plurality of the operators in the neural network computation graph, determine the dependencies between each of the computation subgraphs;

Obtain the processing order of each of the calculation subgraphs according to the dependencies between the calculation subgraphs;

According to the processing sequence, the calculation sub-graph is processed by the processor corresponding to the calculation sub-graph.
The method according to any one of claims 1-5, wherein the attribute information of the operator comprises a combination of one or more of the following: calculation amount, operator function and operator parameter.
A processing device for a neural network computation graph, comprising:

a determination module, configured to determine the processor type corresponding to each of the operators according to the attribute information of each operator in the neural network calculation graph;

A division module, configured to divide the neural network computation graph into one or more computation subgraphs according to the processor type corresponding to each of the operators and the dependencies between a plurality of the operators, each of the The calculation subgraph corresponds to a processor;

The processing module is configured to process the calculation sub-graph through the processor corresponding to the calculation sub-graph.
The apparatus according to claim 10, wherein all operators in each of the calculation subgraphs correspond to one processor of one processor type.
The apparatus according to claim 10, wherein a plurality of the calculation subgraphs correspond to the same processor of one processor type.
The device according to claim 10, wherein the dividing module is configured to: traverse each computation branch of the neural network computation graph along a first direction, the computation branch comprising a plurality of nodes, each node Corresponding to one of the operators; for the current calculation branch, traverse each of the nodes on the current calculation branch along the second direction, and the first direction and the second direction are crossed and set; based on the traversal order, according to each node The neural network computation graph is divided into one or more computation subgraphs according to the processor type corresponding to the node, the dependencies among a plurality of the nodes, and a preset division rule.
The device according to claim 13, wherein the preset division rule comprises:

For the current node in the current computing branch, if the processor type corresponding to the current node and the previous node is the same, the current node is divided into the computing subgraph where the previous node is located;

If the processor types corresponding to the current node and the previous node are different, divide the current node and the previous node into different computing subgraphs;

Wherein, the same node does not exist between different calculation subgraphs.
The device according to claim 14, wherein the preset division rule comprises:

If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is not the last child node of the next node along the first direction, then traverse each node of the next calculation branch;

If the next node of the current node has multiple child nodes in the neural network calculation graph, and the current node is the last child node of the next node along the first direction, continue to traverse the next node in the current calculation branch a node.
The device according to claim 14 or 15, wherein the preset division rule further comprises:

If the calculation subgraph where the current node is located has multiple parent nodes in the neural network calculation graph, and at least one of the multiple parent nodes has a processor type corresponding to a node that is different from the processor type corresponding to the current node, it is determined that any - the parent node and the current node belong to different computing subgraphs.
The apparatus according to claim 10, wherein the processing module is configured to: determine the relationship between each of the calculation subgraphs according to the dependencies between the plurality of the operators in the neural network calculation graph Dependency relationship; according to the dependency relationship between each of the calculation subgraphs, the processing sequence of each of the calculation subgraphs is obtained; according to the processing sequence, the calculation subgraph is processed by the processor corresponding to the calculation subgraph deal with.
The apparatus according to any one of claims 10-14, wherein the attribute information of the operator includes a combination of one or more of the following: calculation amount, operator function, and operator parameter.
A processing device, characterized in that it comprises: a processor, a memory, and a program stored on the memory and executable on the processor, and when the program is executed by the processor, the programs according to claims 1 to 1 are implemented. Steps of the method for processing a neural network computation graph according to any one of 9.
A readable storage medium, characterized in that, a program is stored on the readable storage medium, and when the program is executed by a processor, the processing of the neural network computation graph according to any one of claims 1 to 9 is realized steps of the method.