CN109284815B

CN109284815B - Neural network model algorithm compiling method and device and related products

Info

Publication number: CN109284815B
Application number: CN201811456696.3A
Authority: CN
Inventors: 不公告发明人
Original assignee: Anhui Cambricon Information Technology Co Ltd
Current assignee: Anhui Cambricon Information Technology Co Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-11-24
Anticipated expiration: 2038-11-30
Also published as: CN109284815A

Abstract

The present disclosure relates to a neural network model algorithm compiling method, device and related product, the method comprises: fusing at least two continuous neural network accelerator nodes in a computational graph of a neural network model algorithm to obtain a first fusion point, wherein the computational graph comprises a plurality of nodes, and the nodes comprise neural network accelerator nodes and a central processing unit node; compiling the nodes except the first fusion point and the first fusion point respectively to obtain intermediate expression of the calculation graph; and obtaining an executable file of the neural network model algorithm according to the intermediate expression of the calculation graph. The embodiment of the disclosure can save system resources, shorten the operation time of the neural network model, and improve the calculation efficiency of the neural network model.

Description

Neural network model algorithm compiling method and device and related products

Technical Field

The present disclosure relates to the field of information processing technologies, and in particular, to a neural network model algorithm compiling method, device, and related product.

Background

With the continuous development of information technology, the requirement for completing complex tasks by using a neural network model is greater and greater, the algorithm of the neural network model is complex, and the calculation efficiency is low.

Disclosure of Invention

In view of this, the present disclosure provides a neural network model algorithm compiling method, device and related product, so as to improve the calculation efficiency of the neural network model.

According to an aspect of the present disclosure, there is provided a neural network model algorithm compiling method, the method including:

fusing at least two continuous neural network accelerator nodes in a computational graph of a neural network model algorithm to obtain a first fusion point, wherein the computational graph comprises a plurality of nodes, and the nodes comprise neural network accelerator nodes and a central processing unit node;

compiling the nodes except the first fusion point and the first fusion point respectively to obtain intermediate expression of the calculation graph;

and obtaining an executable file of the neural network model algorithm according to the intermediate expression of the calculation graph.

In a possible implementation manner, fusing at least two consecutive neural network accelerator nodes in a computational graph of a neural network model algorithm to obtain a first fusion point, includes:

determining at least two continuous neural network accelerator nodes in a calculation graph of a neural network model algorithm as nodes to be fused;

when the input data of the nodes to be fused except the first node to be fused and the last node to be fused in at least two continuous neural network accelerator nodes come from the neural network accelerator nodes, fusing the nodes to be fused to obtain a first fusion point.

when the output data of the nodes to be fused are sent to at least two central processor nodes, adding a dependency relationship between the nodes to be fused, which send the output data to the central processor nodes;

and fusing the nodes to be fused to obtain a first fusion point.

In one possible implementation, the method further includes:

fusing at least two continuous central processing unit nodes to obtain a second fusion point;

compiling the nodes except the first fusion point and the first fusion point respectively to obtain intermediate expression of the calculation graph, wherein the intermediate expression comprises the following steps:

and compiling the nodes except the first fusion point and the second fusion point, the first fusion point and the second fusion point respectively to obtain intermediate expression of the calculation graph.

In one possible implementation, the method further includes:

and allocating a buffer space for the first fusion point and/or the second fusion point, wherein the buffer space is not released when the first fusion point and/or the second fusion point completes the operation.

In a possible implementation manner, fusing at least two consecutive neural network accelerator nodes in a computational graph of a neural network model algorithm to obtain a first fusion point includes at least one of the following merging manners:

fusing a batch standardization plus neural network accelerator node, a batch standardization open square neural network accelerator node, a batch standardization plus neural network accelerator node and a batch standardization minus neural network accelerator node in a calculation graph of a neural network model algorithm into a fused batch standardization first fusion point;

fusing two-dimensional convolution neural network accelerator nodes and batch standardization neural network accelerator nodes in a calculation graph of a neural network model algorithm into a fusion convolution batch first fusion point, wherein the batch standardization neural network accelerator nodes comprise one of the following parts: the method comprises the following steps of (1) carrying out batch standardization on neural network accelerator nodes, batch standardization on open square neural network accelerator nodes, batch standardization on multiplication neural network accelerator nodes and batch standardization on subtraction neural network accelerator nodes;

fusing a two-dimensional convolution neural network accelerator node and a biased neural network accelerator node in a calculation graph of a neural network model algorithm into a first fusion point of a fusion convolution bias;

and fusing the matrix multiplication neural network accelerator node and the biased neural network accelerator node in the calculation graph of the neural network model algorithm into a matrix biased first fusion point.

According to an aspect of the present disclosure, there is provided a neural network model algorithm compiling apparatus, the apparatus including:

the first fusion point acquisition module is used for fusing at least two continuous neural network accelerator nodes in a calculation graph of a neural network model algorithm to obtain a first fusion point, wherein the calculation graph comprises a plurality of nodes, and the nodes comprise the neural network accelerator nodes and a central processing unit node;

the intermediate expression acquisition module is used for compiling the nodes except the first fusion point and the first fusion point respectively to obtain intermediate expression of the calculation graph;

and the executable file fetching module is used for obtaining the executable file of the neural network model algorithm according to the intermediate expression of the calculation graph.

In a possible implementation manner, the first fusion point obtaining module is configured to:

In a possible implementation manner, the first fusion point obtaining module is further configured to:

and fusing the nodes to be fused to obtain a first fusion point.

In one possible implementation, the apparatus further includes:

the second fusion point acquisition module is used for fusing at least two continuous central processing unit nodes to obtain a second fusion point;

the intermediate expression obtaining module is further configured to:

In one possible implementation, the apparatus further includes:

and the buffer space distribution acquisition module is used for distributing buffer space for the first fusion point and/or the second fusion point, and when the first fusion point and/or the second fusion point finish operation, the buffer space is not released.

In a possible implementation manner, the first fusion point obtaining module is configured to perform at least one of the following combination manners:

According to an aspect of the present disclosure, there is provided a neural network operation device, including a neural network model algorithm compiling device according to any one of the above, the neural network operation device being configured to perform a set neural network operation.

According to an aspect of the present disclosure, there is provided a combined operation device, the combined operation device comprising one or more of the neural network operation device described in any one of the above, a universal interconnection interface, and other processing devices;

and the neural network operation device interacts with the other processing devices to jointly complete the calculation operation specified by the user.

According to an aspect of the present disclosure, there is provided a neural network chip including:

the neural network model algorithm compiling device of any one of the above; or

The neural network operation device described above; or

The combined treatment device.

According to an aspect of the present disclosure, there is provided an electronic apparatus including:

the neural network model algorithm compiling device of any one of the above; or

The neural network operation device described above; or

The above-described combined treatment apparatus; or

The neural network chip is described above.

In an embodiment of the present disclosure, a computational graph of a neural network model is determined, the computational graph including a plurality of nodes, the nodes including NNA nodes and CPU nodes; fusing at least two continuous NNA nodes to obtain a first fusion point; compiling the nodes except the first fusion point and the first fusion point respectively to obtain intermediate expression of the calculation graph; and obtaining an executable file of the neural network model according to the intermediate expression of the computational graph. The continuous NNA nodes in the calculation graph of the neural network model are fused, so that the starting times of starting the NNA kernel can be reduced, the data transmission quantity between the NNA and the CPU is reduced, the system resources are saved, the operation time of the neural network model is shortened, and the calculation efficiency of the neural network model is improved.

In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a flow diagram of a neural network model algorithm compilation method according to an embodiment of the present disclosure;

FIG. 2 illustrates a schematic diagram of NNA node fusion in a neural network model algorithm compilation method, according to an embodiment of the disclosure;

FIG. 3 illustrates a schematic diagram of NNA node fusion in a neural network model algorithm compilation method according to an embodiment of the disclosure;

FIG. 4 illustrates a flow diagram of a neural network model algorithm compilation method in accordance with an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a fusion mode in a neural network model algorithm compiling method according to an embodiment of the disclosure

FIG. 6 shows a block diagram of a neural network model algorithm compiling device according to an embodiment of the present disclosure;

fig. 7 shows a block diagram of a combined processing device according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 is a flowchart illustrating a neural network model algorithm compiling method according to an embodiment of the present disclosure, and as shown in fig. 1, the neural network model algorithm compiling method includes:

and step S10, fusing at least two continuous neural network accelerator nodes in a calculation graph of the neural network model algorithm to obtain a first fusion point, wherein the calculation graph comprises a plurality of nodes, and the nodes comprise the neural network accelerator nodes and a central processing unit node.

In one possible implementation, the execution of the neural network model may be represented as a computational graph. The computational graph may be a directed acyclic graph. The computational graph includes a plurality of nodes and edges connecting the nodes. The nodes are used to represent operators of certain operations, such as operators of convolution operations or batch normalization operators. Edges between nodes are used to represent the direction of data between nodes. The input data of the neural network model is input into the initial node of the computational graph, the operation is completed through the nodes in the computational graph according to the edges among the nodes, and the computational graph can output the final result of the neural network model.

In a possible implementation manner, when the device executing the neural network model is configured with a Neural Network Accelerator (NNA) and a Central Processing Unit (CPU), the device executing the neural network model is an NNA + CPU heterogeneous platform. Different nodes of the neural network model computation graph may be assigned to the NNAs or the CPUs. When a node is assigned to operate on an NNA, the node is an NNA node. When a node is allocated to run on a CPU, the node is a CPU node. The number and connection relationships of the NNA nodes and the CPU nodes in the computational graph can be determined according to a neural network model. The present disclosure is not limited thereto.

In one possible implementation, when each NNA node in the computation graph performs computation, an NNA kernel (launch NNA kernel) needs to be started once, the starting of the NNA kernel consumes system resources and takes tens of microseconds to several milliseconds, and the startup cost of the NNA kernel is higher when a CPU is busy. After at least two consecutive NNA nodes are merged to obtain a first fusion point, the first fusion point can start the NNA kernel once.

In one possible implementation, after each NNA node completes its computation, the computed data may need to be output to the CPU node. After at least two consecutive NNA nodes are merged into a first fusion point, the data which are sent to the CPU node by the NNA nodes merged into the first fusion point can be sent in a unified way. Data transfer between the NNA nodes and the CPU nodes can be reduced.

And step S20, compiling the nodes except the first fusion point and the first fusion point respectively to obtain the intermediate expression of the calculation graph.

In one possible implementation, different linguistic expressions may be set for the system-on-chip running the neural network model, i.e., different execution codes of the system-on-chip are set. The programming languages used to implement neural network models are also complex and varied. When neural network models implemented using different programming languages are executed on a system-on-chip, the neural model algorithms may be converted into intermediate expressions to enable the neural network models to be executed on the system-on-chip.

In one possible implementation, the intermediate expression and the intermediate code may be preset according to requirements. For example, an intermediate expression of the C language may be set. The language of the intermediate representation may be different from the language of the algorithm or the system on chip, or the same as the language of the algorithm or the system on chip. The present disclosure is not limited thereto.

In a possible implementation manner, each node in the computation graph may be compiled separately to obtain an intermediate expression of each node, and the intermediate expression of the computation graph is obtained according to the intermediate expression of each node. After at least two consecutive neural network accelerator nodes are fused to obtain a first fusion point, the nodes except the first fusion point and the first fusion point can be compiled respectively to obtain intermediate expressions of the nodes except the first fusion point and the intermediate expression of the first fusion point. The method comprises the steps of performing operation type and data type format conversion on operators corresponding to NNA nodes in the first fusion point and input data and output data of the operators to obtain intermediate expression of the first fusion point. The intermediate expression of each node except the first fusion point can be obtained by the same method. The intermediate expression of the computation graph may be obtained based on the intermediate expression of the first fusion point and the intermediate expression of each node other than the first fusion point.

And step S30, obtaining an executable file of the neural network model algorithm according to the intermediate expression of the calculation graph.

In one possible implementation, a library of translations between the system-on-chip execution code and the intermediate representation of the algorithm may be pre-defined. For example, the translation library may be implemented in assembly language. The intermediate representation of the computational graph may be converted into an executable file of the neural network model using a conversion library.

In the embodiment, a computational graph of the neural network model is determined, wherein the computational graph comprises a plurality of nodes, and the nodes comprise NNA nodes and CPU nodes; fusing at least two continuous NNA nodes to obtain a first fusion point; compiling the nodes except the first fusion point and the first fusion point respectively to obtain intermediate expression of the calculation graph; and obtaining an executable file of the neural network model according to the intermediate expression of the computational graph. The continuous NNA nodes in the calculation graph of the neural network model are fused, so that the starting times of starting the NNA kernel can be reduced, the data transmission quantity between the NNA and the CPU is reduced, the system resources are saved, the operation time of the neural network model is shortened, and the calculation efficiency of the neural network model is improved.

In a possible implementation manner, step S10 in the neural network model algorithm compiling method includes:

and determining at least two continuous neural network accelerator nodes in a computational graph of the neural network model algorithm as nodes to be fused.

In one possible implementation, at least two consecutive NNA nodes in the computation graph may be determined as nodes to be fused, and further input data of each node in the nodes to be fused may be determined. The node to be fused may be determined as the first node to be fused, the last node to be fused of the second node to be fused … … according to the data flow direction.

In a possible implementation manner, when the input data of the nodes in the remaining nodes to be fused are from the NNA nodes except the first node to be fused and the last node to be fused, the nodes to be fused may be fused. Fig. 2 is a schematic diagram illustrating the fusion of the NNA nodes in the neural network model algorithm compiling method according to an embodiment of the disclosure, in the calculation diagram shown in fig. 2, a part of input data of the NNA _2 node includes a node from the CPU _1, and if the NNA _0, the NNA _1, and the NNA _2 are fused, the NNA _2 node cannot wait for the input data from the CPU _1 node, which results in a calculation error.

In this embodiment, at least two consecutive NNA nodes are determined as nodes to be fused; and when the input data of the nodes to be fused except the first node to be fused and the last node to be fused in at least two continuous NNA nodes come from the NNA nodes, fusing the nodes to be fused to obtain a fusion point. Except the first and the last nodes to be fused, the fused points do not include the nodes to be fused from the CPU nodes with input data, so that the operation result of the first fused point is accurate.

and fusing the nodes to be fused to obtain a first fusion point.

In a possible implementation manner, when there are output data of at least two nodes to be fused sent to the CPU node, the output data of the two nodes to be fused may not be synchronized. If the output data sent to one of the CPU nodes is already being consumed, the output data sent to the other CPU node is not yet output, which may lead to inaccuracy of the subsequent calculation result. The dependency relationship can be added between the nodes to be fused, including the dependency relationship of equal synchronous output between the nodes to be fused, so that the output data of the nodes to be fused to at least two CPU nodes can be synchronously output.

Fig. 3 is a schematic diagram illustrating the fusion of the NNA nodes in the neural network model algorithm compiling method according to an embodiment of the disclosure, and as shown in fig. 3, NNA _0, NNA _1, and NNA _2 in fig. 3 are fused. The output data of NNA _1 is sent to CPU _1, and the output data of NNA _2 is sent to CPU _ 2. When the output data of NNA _1 is sent to CPU _1 for consumption, the output data of NNA _2 may not have been sent to CPU _ 2. A dependency may be added between NNA _1 and NNA _2 such that the output data of NNA _1 and NNA _2 are transmitted synchronously.

In this embodiment, when the output data of the node to be fused is sent to at least two CPU nodes, a dependency relationship is added between the nodes to be fused, where the output data is sent to the CPU nodes. The dependency relationship can enable data of each node in the fusion point to be output simultaneously, and unnecessary resource waste and calculation errors are avoided.

Fig. 4 is a flowchart illustrating a neural network model algorithm compiling method according to an embodiment of the present disclosure, and as shown in fig. 4, the neural network model algorithm compiling method further includes:

and step S40, fusing at least two continuous CPU nodes to obtain a second fused point.

In one possible implementation, at least two consecutive CPU nodes may be merged. The CPU nodes in the computational graph may be fused into a plurality of second fusion points.

Step S20, including:

and step S21, compiling the nodes except the first fusion point and the second fusion point, the first fusion point and the second fusion point respectively to obtain the intermediate expression of the calculation graph.

In one possible implementation, the first fusion point, the second fusion point, the remaining CPU nodes after the fusion, and the remaining NNA nodes may be compiled separately. Different conversion libraries may be provided for the first fusion point and the second fusion point, respectively. The first fusion point and the second fusion point can be converted into corresponding intermediate expressions by the corresponding conversion libraries.

In this embodiment, at least two consecutive CPU nodes may be fused to obtain a second fused point, and nodes other than the first fused point and the second fused point, the first fused point, and the second fused point are compiled separately to obtain the intermediate expression of the computation graph. The CPU nodes are fused, so that the operation efficiency of the neural network model can be further improved, and system resources are saved.

In one possible implementation manner, the neural network model algorithm compiling method further includes:

In one possible implementation, a large number of provisional outputs are generated during the calculation of the first and/or second fusion point. A buffer space may be allocated for the first fusion point and/or the second fusion point, and the generated temporary output is sent into the buffer space allocated for the fusion point.

In one possible implementation, during the calculation of the first fusion point and/or the second fusion point, the state of the buffer space corresponding thereto may be set to "unavailable" or "busy", etc. After the first fusion point and/or the second fusion point complete the calculation, the state of the buffer space corresponding to the first fusion point and/or the second fusion point may be set to "available" or "free", and the buffer space corresponding to the first fusion point and/or the second fusion point may not be released. A buffer list may be established to record and manage the buffer space allocated to the first fusion point and/or the second fusion point, and during the calculation process of the fusion point or after the calculation is completed, the state of the buffer space corresponding to the fusion point is updated in the buffer list.

In this embodiment, a buffer space may be allocated to the first fusion point and/or the second fusion point, and the buffer space is not released when the first fusion point and/or the second fusion point completes the operation. And the buffer space is allocated for the fusion point, so that the system resource occupation can be reduced, and the operation efficiency of the neural network model can be improved.

Fig. 5 is a schematic diagram illustrating a fusion manner in a neural network model algorithm compiling method according to an embodiment of the present disclosure, and as shown in fig. 5, in one possible implementation, at least two consecutive neural network accelerator nodes in a computation graph of a neural network model algorithm are fused to obtain a first fusion point, where the fusion manner includes at least one of the following merging manners:

the batch normalization plus neural network accelerator nodes, the batch normalization open square neural network accelerator nodes, the batch normalization multiply neural network accelerator nodes and the batch normalization minus neural network accelerator nodes in the calculation graph of the neural network model algorithm are fused into a fused batch normalization first fusion point, for example, BatchNorm (Add, Rsqrt, Mul, Sub, Mul) is fused into fusedbcatchnorm in fig. 5.

Fusing two-dimensional convolution neural network accelerator nodes and batch standardization neural network accelerator nodes in a calculation graph of a neural network model algorithm into a fusion convolution batch first fusion point, wherein the batch standardization neural network accelerator nodes comprise one of the following parts: the method comprises the following steps of (1) carrying out batch standardization on neural network accelerator nodes, batch standardization on open square neural network accelerator nodes, batch standardization on multiplication neural network accelerator nodes and batch standardization on subtraction neural network accelerator nodes; conv2D + BatchNorm was fused to FusetConv 2DBias as in FIG. 5.

Fusing a two-dimensional convolution neural network accelerator node and a biased neural network accelerator node in a calculation graph of a neural network model algorithm into a first fusion point of a fusion convolution bias; conv2D + BiasAdd was fused to FusedConv2DBias as in FIG. 5.

And fusing the matrix multiplication neural network accelerator node and the biased neural network accelerator node in the calculation graph of the neural network model algorithm into a matrix biased first fusion point, for example, fusing MatMul + BiasAdd into MLP in the graph of FIG. 5.

Fig. 6 is a block diagram of a neural network model algorithm compiling device according to an embodiment of the present disclosure, and as shown in fig. 6, the neural network model algorithm compiling device includes:

a first fusion point obtaining module 10, configured to fuse at least two consecutive neural network accelerator nodes in a computational graph of a neural network model algorithm to obtain a first fusion point, where the computational graph includes a plurality of nodes, and the nodes include a neural network accelerator node and a central processing unit node;

an intermediate expression obtaining module 20, configured to compile nodes except the first fusion point and the first fusion point respectively to obtain an intermediate expression of the computation graph;

and the executable file fetching module 30 is configured to obtain an executable file of the neural network model algorithm according to the intermediate expression of the computation graph.

and fusing the nodes to be fused to obtain a first fusion point.

In one possible implementation, the apparatus further includes:

the intermediate expression obtaining module is further configured to:

In one possible implementation, the apparatus further includes:

Fig. 7 is a block diagram of a combined processing device according to an embodiment of the disclosure, as shown in fig. 7, which includes the neural network operation device, the universal interconnection interface, and other processing devices.

The neural network arithmetic device interacts with other processing devices to jointly complete the operation designated by the user. Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the neural network arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the neural network arithmetic device; other processing devices can cooperate with the neural network arithmetic device to complete the arithmetic task. And the universal interconnection interface is used for transmitting data and control instructions between the neural network arithmetic device and other processing devices. The neural network arithmetic device acquires required input data from other processing devices and writes the input data into a storage device on the neural network arithmetic device chip; control instructions can be obtained from other processing devices and written into a control cache on a neural network arithmetic device chip; the data in the storage module of the neural network arithmetic device can also be read and transmitted to other processing devices.

The combined processing device may further include a storage device, and the storage device is connected to the neural network operation device and the other processing device, respectively. The storage device is used for storing data in the neural network arithmetic device and the other processing devices, and is particularly suitable for data which are required to be calculated and cannot be stored in the internal storage of the neural network arithmetic device or the other processing devices.

The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.

In a possible implementation manner, the present disclosure further provides a neural network chip, which includes the above neural network operation device or the combined processing device.

In a possible implementation manner, the present disclosure further provides a chip packaging structure, which includes the above chip.

In a possible implementation manner, the present disclosure further provides a board card, which includes the above chip package structure.

In a possible implementation manner, the present disclosure further provides an electronic device, which includes the above board card.

The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The embodiments of the present disclosure are described in detail above, and the principles and embodiments of the present disclosure are explained herein by applying specific embodiments, and the descriptions of the embodiments are only used to help understanding the method and the core ideas of the present disclosure; meanwhile, for a person skilled in the art, based on the idea of the present disclosure, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present disclosure should not be construed as a limitation to the present disclosure.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for compiling a neural network model algorithm, the method comprising:

fusing at least two continuous neural network accelerator nodes in a computational graph of a neural network model algorithm to obtain a first fusion point, wherein the computational graph comprises a plurality of nodes, the nodes comprise neural network accelerator nodes and central processor nodes, and the computational graph comprises a directed acyclic graph;

obtaining an executable file of the neural network model algorithm according to the intermediate expression of the computation graph,

the method for fusing at least two continuous neural network accelerator nodes in a computational graph of a neural network model algorithm to obtain a first fusion point comprises the following steps:

determining at least two continuous neural network accelerator points in a calculation graph of a neural network model algorithm as nodes to be fused;

and when the input data of the nodes to be fused except the first node to be fused and the last node to be fused come from the neural network accelerator node, fusing the nodes to be fused to obtain a first fusion point.

2. The method of claim 1, wherein fusing at least two consecutive neural network accelerator nodes in a computational graph of a neural network model algorithm to obtain a first fusion point comprises:

when the output data of the first fusion point are sent to at least two central processor nodes, the output data are sent to the nodes to be fused of the central processor nodes, and a dependency relationship is added;

and fusing the nodes to be fused to obtain a first fusion point.

3. The method of claim 1, further comprising:

and allocating buffer space for the first fusion point, wherein the buffer space is not released when the first fusion point completes the operation.

4. The method of claim 1, wherein fusing at least two consecutive neural network accelerator nodes in a computational graph of a neural network model algorithm to obtain a first fusion point comprises at least one of the following combinations:

merging the batch standardization plus neural network accelerator nodes, the batch standardization open square neural network accelerator nodes, the batch standardization multiplying neural network accelerator nodes and the batch standardization minus neural network accelerator nodes in a calculation graph of a neural network model algorithm into a fusion batch standardization fusion point;

merging two-dimensional convolution neural network accelerator nodes and batch standardization neural network accelerator nodes in a calculation graph of a neural network model algorithm into a fusion convolution batch fusion point, wherein the batch standardization neural network accelerator nodes comprise one of the following neural network accelerator nodes: the method comprises the following steps of (1) carrying out batch standardization on neural network accelerator nodes, batch standardization on open square neural network accelerator nodes, batch standardization on multiplication neural network accelerator nodes and batch standardization on subtraction neural network accelerator nodes;

combining two-dimensional convolution neural network accelerator nodes and biased neural network accelerator nodes in a calculation graph of a neural network model algorithm into a fusion convolution bias fusion point;

and combining the matrix multiplication neural network accelerator node and the biased neural network accelerator node in the calculation graph of the neural network model algorithm into a matrix biased fusion point.

5. An apparatus for compiling a neural network model algorithm, the apparatus comprising:

the first fusion point acquisition module is used for fusing at least two continuous neural network accelerator nodes in a computational graph of a neural network model algorithm to obtain a first fusion point, wherein the computational graph comprises a plurality of nodes, the nodes comprise neural network accelerator nodes and a central processor node, and the computational graph comprises a directed acyclic graph;

an executable file fetching module for obtaining an executable file of the neural network model algorithm according to the intermediate expression of the computation graph,

wherein the first fusion point obtaining module is configured to:

6. The apparatus of claim 5, wherein the first fusion point obtaining module is further configured to:

and fusing the nodes to be fused to obtain a first fusion point.

7. The apparatus of claim 5, further comprising:

and the buffer space distribution acquisition module is used for distributing buffer space for the first fusion point, and when the first fusion point completes operation, the buffer space is not released.

8. The apparatus of claim 5, wherein the first fusion point obtaining module is configured to perform at least one of the following combinations:

9. A neural network operation device, comprising one or more neural network model algorithm compiling devices according to any one of claims 5 to 8, wherein the neural network operation device is configured to perform a set neural network operation.

10. A combined operation device, characterized in that the combined operation device comprises one or more of the neural network operation device according to any one of claim 9, a universal interconnection interface and other processing devices;

11. A neural network chip, comprising: compiling means of the neural network model algorithm of any one of claims 5-8; or

The neural network operation device of claim 9; or

A combinational operation device as described in claim 10.

12. An electronic device, characterized in that the electronic device comprises:

compiling means of the neural network model algorithm of any one of claims 5-8; or

The neural network operation device of claim 9; or

A combinational operation device according to claim 10; or

The neural network chip of claim 11.