CN111860820A

CN111860820A - Neural network operator dividing method and device and dividing equipment

Info

Publication number: CN111860820A
Application number: CN202010757561.1A
Authority: CN
Inventors: 戚海涛; 李涵; 吴欣洋; 张爱飞; 丁瑞强
Original assignee: Beijing Lynxi Technology Co Ltd
Current assignee: Beijing Lynxi Technology Co Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-10-30
Also published as: WO2022022670A1

Abstract

The embodiment of the invention provides a method, a device and equipment for dividing a neural network operator, wherein the method comprises the following steps: acquiring attribute information of all operators in a neural network calculation graph; determining a processor corresponding to each operator according to the attribute information of the operator; dividing the neural network computational graph into one or more computational subgraphs, wherein each computational subgraph corresponds to one processor; and processing each computational subgraph through a processor corresponding to the computational subgraph. In the embodiment of the invention, the processor of the corresponding type can be selected according to the attribute information of the operator, and the selected processor is used for processing the operator, so that the processing speed of the neural network computation graph can be improved, and the problem of low processing speed of the neural network computation graph caused by the fact that the processor of the specific type cannot process some operators or the processing efficiency is low is solved.

Description

Neural network operator dividing method and device and dividing equipment

Technical Field

The embodiment of the invention relates to the technical field of network communication, in particular to a method, a device and equipment for dividing a neural network operator.

Background

The neural network is composed of a plurality of operators, and the operators can be regarded as operation functions, such as convolution, pooling (posing) and the like.

Although a Central Processing Unit (CPU) can run all operators, the computation load of convolution, full connection (1024 × 4096) and the like is large, and the CPU runs very slowly and has low efficiency. At this time, a dedicated Accelerated Processing Unit (APU) can be selected for operation. However, the APU often supports limited instructions, and some special operators cannot be operated by the APU.

Therefore, how to perform processor allocation of neural network operators is an urgent technical problem to be solved.

Disclosure of Invention

An object of the embodiments of the present invention is to provide a method, an apparatus, and a device for partitioning a neural network operator, so as to solve a problem how to perform processor allocation of the neural network operator.

In a first aspect, a method for partitioning a neural network operator is provided, including:

acquiring attribute information of all operators in a neural network calculation graph;

determining a processor corresponding to each operator according to the attribute information of the operator;

dividing the neural network computational graph into one or more computational subgraphs, wherein all operators in each computational subgraph correspond to one processor;

and processing each computational subgraph through a processor corresponding to the computational subgraph.

Optionally, the dividing the neural network computation graph into one or more computation subgraphs, where all operators in each computation subgraph correspond to one processor, includes:

and dividing the neural network computation graph into one or more computation subgraphs according to the attribute information of the operators and the incidence relation between the operators in the neural network computation graph, wherein all the operators in each computation subgraph correspond to one processor.

Optionally, dividing the neural network computation graph into one or more computation sub-graphs according to the attribute information of the operators and the association relationship between the operators in the neural network computation graph, including:

traversing each operator in the neural network computation graph according to the reverse direction of the processing sequence of each operator in the neural network computation graph;

and dividing the neural network computation graph into one or more computation subgraphs according to preset division rules according to the attribute information of the operators and the incidence relation between the operators in the neural network computation graph.

Optionally, the preset partitioning rule includes one or more of the following combinations:

if the attribute information of the operator as the child node is the same as that of the operator as the father node, dividing the two operators into the same computational subgraph;

if one computational subgraph has a plurality of operators as father nodes and the attribute information of the operators as the father nodes is different, the computational subgraph is divided separately for processing;

if the father node of a child node has a plurality of child nodes and the child node is not the last child branch node of the father node, the father node is not traversed, and if the child node is the last child branch node of the father node, the father node is traversed.

Optionally, the processing, by the processor corresponding to the computation subgraph, each computation subgraph includes:

determining the incidence relation among all computation subgraphs according to the incidence relation among all operators in the neural network computation graph;

obtaining a processing sequence of each computational subgraph according to the incidence relation among the computational subgraphs;

and processing each computation subgraph through a corresponding processor according to the processing sequence.

Optionally, the attribute information of the operator includes one or more of the following combinations: the calculation amount, the operator function and the operator parameter.

In a second aspect, an apparatus for partitioning a neural network operator is provided, including:

the acquisition module is used for acquiring attribute information of all operators in the neural network calculation graph;

the determining module is used for determining the processor corresponding to each operator according to the attribute information of the operator;

the dividing module is used for dividing the neural network computational graph into one or more computational subgraphs according to the types of the processors, and each computational subgraph corresponds to one type of processor;

and the processing module is used for processing each computational subgraph through the processor corresponding to the computational subgraph.

Optionally, the dividing module is further configured to: and dividing the neural network computation graph into one or more computation subgraphs according to the attribute information of the operators and the incidence relation between the operators in the neural network computation graph, wherein all the operators in each computation subgraph correspond to one processor.

Optionally, the processing module is further configured to: determining the incidence relation among all computation subgraphs according to the incidence relation among all operators in the neural network computation graph; obtaining a processing sequence of each computational subgraph according to the incidence relation among the computational subgraphs; and processing each computation subgraph through a corresponding processor according to the processing sequence.

In a third aspect, a partitioning device for neural network operators is provided, including: a processor, a memory and a program stored on the memory and executable on the processor, which program, when executed by the processor, implements the steps of the method of partitioning neural network operators according to the first aspect.

In a fourth aspect, a readable storage medium is provided, on which a program is stored, which when executed by a processor implements the steps of the method for partitioning neural network operators according to the first aspect.

In the embodiment of the invention, the processor of the corresponding type can be selected according to the attribute information of the operator, and the selected processor is used for processing the operator, so that the processing speed of the neural network computation graph can be improved, and the problem of low processing speed of the neural network computation graph caused by the fact that the processor of the specific type cannot process some operators or the processing efficiency is low is solved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flowchart of a method for partitioning neural network operators according to an embodiment of the present invention;

FIG. 2 is a second flowchart of a method for partitioning neural network operators according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating the partitioning of neural network operators according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating an apparatus for partitioning neural network operators according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a partitioning device for neural network operators according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprises," "comprising," or any other variation thereof, in the description and claims of this application, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Furthermore, the use of "and/or" in the specification and claims means that at least one of the connected objects, such as a and/or B, means that three cases, a alone, B alone, and both a and B, exist.

In the embodiments of the present invention, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "e.g.," an embodiment of the present invention is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

Referring to fig. 1, an embodiment of the present invention provides a method for partitioning a neural network operator, including: step 101, step 102, step 103 and step 104.

Step 101: acquiring attribute information of all operators in a neural network computation graph (computational graph).

It is understood that the neural network may be a computational network consisting of Operators (OPs). Alternatively, the neural network is a tree-structured neural network model, but it is understood that the specific form of the neural network model is not limited.

Step 102: and determining the processor corresponding to each operator according to the attribute information of the operator.

In the embodiment of the present invention, the types of the processors (which may also be referred to as Processing units) may include a Central Processing Unit (CPU), an Accelerated Processing Unit (APU), a Graphics Processing Unit (GPU), a Tensor Processing Unit (TPU), and the like, but are not limited thereto.

For example, the attribute information of operator 1 includes one or more of the following combinations: the calculated quantity A, the operator function A and the operator parameter A, and then the processor 1 (such as a central processing unit) corresponding to the operator 1; the attribute information of operator 2 includes one or more of the following combinations: calculating the quantity B, the operator function B and the operator parameter B, and then the processor 2 (such as an acceleration processor) corresponding to the operator 2;

step 103: and dividing the neural network computation graph into one or more computation subgraphs, wherein all operators in each computation subgraph correspond to one processor.

That is, each computation subgraph may correspond to a processor, for example, the neural network computation graph is divided into computation subgraph 1 and computation subgraph 2, where all operators in computation subgraph 1 correspond to CPUs, that is, all operators in computation subgraph 1 may be processed by CPUs at the same time, and all operators in computation subgraph 2 correspond to APUs.

It can be understood that, in the embodiment of the present invention, the number of the computation graph divided into the computation subgraphs is not limited.

Step 104: and processing each computational subgraph through a processor corresponding to the computational subgraph.

That is, each of the computation subgraphs is compiled by a processor corresponding to the computation subgraph, or each of the computation subgraphs is compiled and run by a processor corresponding to the computation subgraph.

Referring to fig. 2, an embodiment of the present invention provides a method for partitioning a neural network operator, including: step 201 to step 205.

Step 201: and acquiring attribute information of all operators in the neural network computation graph.

Step 202: and dividing the neural network computation graph into one or more computation subgraphs according to the attribute information of the operators and the incidence relation between the operators in the neural network computation graph, wherein all the operators in each computation subgraph correspond to one processor.

The association relationship includes: the hierarchical relation of each operator in the neural network and/or the processing sequence of each operator.

Taking the neural network as a tree neural network as an example, the association relationship may include: parent-child node relationships, sibling node relationships, and the like.

It is understood that in the node tree, a parent node owns a child node. The children nodes of the same level are called siblings (brothers or sisters). The top node is called the root. Each node has a parent, except for the root (which has no parent).

The processing sequence of the operators refers to the compiling sequence or the compiling and running sequence of each operator in the neural network computation graph.

Optionally, traversing each operator in the neural network computation graph from bottom to top and from left to right according to the reverse direction of the processing sequence of each operator in the neural network computation graph, and dividing the neural network computation graph into one or more computation subgraphs according to a preset division rule according to the attribute information of the operator and the association relationship between each operator in the neural network computation graph.

Optionally, the partitioning rule includes one or more of the following combinations:

rule 1: if the attribute information of the operator as the child node is the same as that of the operator as the father node, dividing the two operators into the same computational subgraph;

rule 2: if one computational subgraph has a plurality of operators as father nodes and the attribute information of the operators as the father nodes is not completely the same, the computational subgraph is divided separately for processing;

rule 3: if the parent node of a child node has a plurality of child nodes and the child node is not the last child branch node of the parent node, the parent node is not traversed, and if the child node is the last child branch node of the parent node, the parent node is traversed.

It is understood that, in the embodiment of the present invention, whether to divide the neural network computational graph into one computational subgraph or multiple computational subgraphs may be determined based on the above-mentioned division rule.

For example, the neural network computational graph includes a plurality of branches, and the attribute information of the operators of the parent node and the operators of the child nodes on each branch are the same, so that the neural network computational graph can be divided into a computational subgraph, that is, the computational subgraph is the neural network computational graph. For example, the neural network computational graph includes a plurality of branches, and the attribute information of the operator of the parent node and the operator of the child node on each branch are not completely the same, the neural network computational graph may be divided into a plurality of computational subgraphs based on the above-mentioned division rule.

Step 203: and determining the association relation among all computation subgraphs according to the association relation among all operators in the neural network computation graph.

The incidence relation among the computation subgraphs is used for representing the hierarchical relation of the computation subgraphs in the computation subgraphs, and the compiling order, or the compiling and running order of each computation subgraph can be determined based on the incidence relation among the computation subgraphs.

Step 204: and obtaining the processing sequence of each computational subgraph according to the association relation among the computational subgraphs.

Step 205: and processing each computation subgraph through a processor of a corresponding type according to the processing sequence.

In the embodiment of the invention, the operators with the same attribute information are divided according to the computation subgraph regions, and the processing is carried out according to the incidence relation of each computation subgraph, so that the accuracy of operator operation in the neural network can be ensured, and the efficiency of operation processing can be improved.

Referring to fig. 3, for example, according to

steps

201 and 202, operator N7 is processed by the CPU and the other operators (N1 to N6, and N8 and N9) are processed by the APUs, and all the operators in the network need to be divided.

N6 is the same as N5 (parent node of N6) in attribute information (or referred to as type) (rule 1), and N5 and N6 are drawn as the same computational subgraph, defined as computational subgraph a;

n4 is of the same type as N5 (rule 1), with N4 and (N5+ N6) also drawn into the A computation subgraph;

the type of the processor corresponding to N3 is the same as that of the processor corresponding to the computational sub-graph A, and N3 and (N4+ N5+ N6) are divided into the computational sub-graph A;

the parent node N2 of N3 has multiple children, and N3 is not the last child node of N2, then the node N2 is not traversed;

traversing the branch N6-N8-N7-N2, wherein the type of the processor corresponding to N8 is the same as that of the processor corresponding to the computational sub-graph A, and dividing N8 and (N3+ N4+ N5+ N6) into the computational sub-graph A;

the type of the processor corresponding to N7 is different from that of the processor corresponding to the computational sub-graph A, and a computational sub-graph B is divided separately;

the parent node N2 of N7 has multiple children, and N7 is not the last child node of N2, then the node N2 is not traversed;

traversing the branch N8-N9-N2;

the computational sub-graph a has three sources, N2, N7, and N9, and the three sources are of different types (N7 is computational sub-graph B), so the type of the processor corresponding to N9 is the same as the type of the processor corresponding to computational sub-graph a, but is not divided into computational sub-graph a, and N9 is divided into computational sub-graph C;

since N9 is the last child node of N2, the N2 node is traversed, and since the kinds of processors corresponding to the N2 node are the same, the N2 is drawn into the computational sub-graph C.

In the embodiment of the invention, the corresponding type processor can be selected according to the attribute information of the operator, and the selected processor is used for processing the operator, so that the processing speed of the neural network computation graph can be improved, and the problem that the processing speed of the neural network computation graph is low due to the fact that a certain type of processor cannot process some operators or the processing efficiency is low is solved.

Referring to fig. 4, an embodiment of the present invention provides an apparatus for partitioning a neural network operator, where the apparatus 400 includes:

an obtaining module 401, configured to obtain attribute information of all operators in the neural network computation graph;

A determining module 402, configured to determine, according to the attribute information of the operators, a processor corresponding to each operator;

a dividing module 403, configured to divide the neural network computational graph into one or more computational subgraphs according to the types of the processors, where each computational subgraph corresponds to one type of processor.

And the processing module 404 is configured to process each computational subgraph through a processor corresponding to the computational subgraph.

In some embodiments, the partitioning module 403 is further configured to: and dividing the neural network computation graph into one or more computation subgraphs according to the attribute information of the operators and the incidence relation between the operators in the neural network computation graph, wherein all the operators in each computation subgraph correspond to one processor.

In some embodiments, the partitioning module 403 is further configured to: traversing each operator in the neural network computation graph according to the reverse direction of the processing sequence of each operator in the neural network computation graph; and dividing the neural network computation graph into one or more computation subgraphs according to preset division rules according to the attribute information of the operators and the incidence relation between the operators in the neural network computation graph.

In some embodiments, the processing module 403 is further configured to: determining the incidence relation among all computation subgraphs according to the incidence relation among all operators in the neural network computation graph; obtaining a processing sequence of each computational subgraph according to the incidence relation among the computational subgraphs; and processing each computation subgraph through a corresponding processor according to the processing sequence.

The partitioning device for neural network operators provided by the embodiment of the present invention can implement each process implemented by the method embodiments shown in fig. 1 and fig. 2, and achieve the same technical effect, and is not described here again to avoid repetition.

As shown in fig. 5, an embodiment of the present application further provides a dividing apparatus for a neural network operator, where the dividing apparatus 500 includes a processor 501, a memory 502, and a program or an instruction stored in the memory 502 and capable of being executed on the processor 501, and when the program or the instruction is executed by the processor 501, the program or the instruction implements each process of the method embodiment shown in fig. 1 or fig. 2, and can achieve the same technical effect, and is not described herein again to avoid repetition.

An embodiment of the present invention further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the method embodiment shown in fig. 1 or fig. 2, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. A method for partitioning neural network operators, comprising:

2. The method of claim 1, wherein the partitioning of the neural network computational graph into one or more computational subgraphs, all operators in each computational subgraph corresponding to a processor, comprises:

3. The method according to claim 2, wherein dividing the neural network computation graph into one or more computation subgraphs according to the attribute information of the operators and the association relationship between the operators in the neural network computation graph comprises:

4. The method of claim 3, wherein the preset partitioning rule comprises one or more of the following combinations:

5. The method of claim 1, wherein the processing each of the computational subgraphs by a corresponding processor of the computational subgraphs comprises:

6. The method according to any of claims 1-5, wherein the attribute information of the operator comprises one or more of the following combinations: the calculation amount, the operator function and the operator parameter.

7. An apparatus for partitioning neural network operators, comprising:

8. The apparatus of claim 7, wherein the partitioning module is further configured to: and dividing the neural network computation graph into one or more computation subgraphs according to the attribute information of the operators and the incidence relation between the operators in the neural network computation graph, wherein all the operators in each computation subgraph correspond to one processor.

9. The apparatus of claim 7, wherein the processing module is further configured to: determining the incidence relation among all computation subgraphs according to the incidence relation among all operators in the neural network computation graph; obtaining a processing sequence of each computational subgraph according to the incidence relation among the computational subgraphs; and processing each computation subgraph through a corresponding processor according to the processing sequence.

10. The apparatus according to any of claims 7-9, wherein the attribute information of the operator comprises one or more of the following combinations: the calculation amount, the operator function and the operator parameter.

11. An apparatus for partitioning neural network operators, comprising: processor, memory and program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the method of partitioning a neural network operator as claimed in any one of claims 1 to 6.

12. A readable storage medium, characterized in that the readable storage medium has stored thereon a program which, when being executed by a processor, realizes the steps of the method of partitioning neural network operators according to any one of claims 1 to 6.