CN110717574B

CN110717574B - Neural network operation method and device and heterogeneous intelligent chip

Info

Publication number: CN110717574B
Application number: CN201810757736.1A
Authority: CN
Inventors: 丁健
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-07-11
Filing date: 2018-07-11
Publication date: 2023-07-07
Anticipated expiration: 2038-07-11
Also published as: CN110717574A

Abstract

The embodiment of the invention provides a neural network operation method, a device and a heterogeneous intelligent chip, wherein the neural network operation method comprises the following steps: acquiring a neural network to be operated; dividing a network layer of the neural network to obtain a plurality of layers to be operated of the neural network; and determining the computing cores meeting the preset operation conditions aiming at each layer to be operated so as to enable each computing core to operate each layer to be operated according to the preset operation sequence. Through the scheme, the operation efficiency of the neural network can be improved.

Description

Neural network operation method and device and heterogeneous intelligent chip

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a neural network operation method and apparatus, and a heterogeneous intelligent chip.

Background

The neural network is used as the basis of machine learning and deep learning, analyzes data by simulating a mechanism of a human brain, and is an intelligent model for analysis learning by building and simulating the human brain. Currently, neural networks have become a mainstream application method in image classification, target detection, target tracking, voice recognition, and the like.

With the continuous development of deep learning technology, the network structure of the neural network is more and more huge, the number of network layers is more and more, and the requirement of high operation efficiency of the neural network cannot be met by the traditional hardware platform. In order to cope with the problem, in the related neural network operation method, the operation amount of each network layer is reduced by optimizing each network layer in the neural network, or the two operations of data carrying and calculation of each network layer are parallel, so that the time consumption of the operation of the whole neural network is shortened, and the operation efficiency of the neural network is further improved.

However, with the development of the neural network, the operation of the network layer in the neural network is more and more complex, and the optimization of the network layer is often based on a known operation mechanism, so that the operation efficiency of the neural network improved by performing the optimization processing on the network layer is actually limited, and the improvement effect of the operation efficiency is poor.

Disclosure of Invention

The embodiment of the invention aims to provide a neural network operation method and device and a heterogeneous intelligent chip so as to improve the operation efficiency of the neural network. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a neural network operation method, where the method includes:

acquiring a neural network to be operated;

dividing the neural network into network layers to obtain a plurality of layers to be operated of the neural network;

and determining the computing cores meeting the preset operation conditions aiming at each layer to be operated so as to enable each computing core to operate each layer to be operated according to the preset operation sequence.

Optionally, the acquiring the neural network to be operated includes:

and acquiring the neural network with the highest priority from the plurality of neural networks to be operated according to the preset network priority.

Optionally, the performing network layer division on the neural network to obtain a plurality of layers to be operated of the neural network includes:

And dividing the neural network based on the number of network layers in the neural network to obtain a plurality of layers to be operated.

Optionally, the determining, for each layer to be run, a computing core that meets a preset running condition includes:

judging whether the current layer to be operated is operated;

if the current layer to be operated does not finish operation, acquiring an operation rule of the current layer to be operated; determining a computing core having an operating condition corresponding to the operation rule based on the operation rule; and sending the current layer to be operated to the computing core so that the computing core operates the current layer to be operated.

Optionally, after the network layer division is performed on the neural network to obtain a plurality of layers to be operated of the neural network, the method further includes:

obtaining the calculated amount of each layer to be operated through analysis, and judging whether the calculated amount of each layer to be operated is larger than a preset threshold value or not;

splitting the layer to be operated aiming at the layer to be operated with the calculated amount larger than the preset threshold value to obtain a plurality of sub-layers of the layer to be operated;

the determining, for each layer to be operated, a computing core that satisfies a preset operation condition includes:

Judging whether the current layer to be operated is operated;

if the current layer to be operated does not finish operation, acquiring operation rules of all sub-layers which are not operated in the current layer to be operated; determining a computing core having an operation condition corresponding to each operation rule based on the operation rule; and sending each non-running sub-layer to each corresponding computing core so that each computing core runs each non-running sub-layer.

In a second aspect, an embodiment of the present invention provides a neural network operating device, including:

the acquisition module is used for acquiring the neural network to be operated;

the network analysis module is used for carrying out network layer division on the neural network to obtain a plurality of layers to be operated of the neural network;

the scheduler module is used for determining the computing cores meeting the preset operation conditions aiming at each layer to be operated so as to enable each computing core to operate each layer to be operated according to the preset operation sequence.

Optionally, the acquiring module is specifically configured to:

Optionally, the network parsing module is specifically configured to:

Optionally, the scheduler module is specifically configured to:

judging whether the current layer to be operated is operated;

Optionally, the apparatus further includes:

the layer splitting module is used for obtaining the calculated amount of each layer to be operated through analysis and judging whether the calculated amount of each layer to be operated is larger than a preset threshold value or not; splitting the layer to be operated aiming at the layer to be operated with the calculated amount larger than the preset threshold value to obtain a plurality of sub-layers of the layer to be operated;

the scheduler module is specifically configured to:

judging whether the current layer to be operated is operated;

In a third aspect, an embodiment of the present invention provides a heterogeneous smart chip, including a main core, a plurality of computing cores, and a storage medium, where,

the storage medium is used for storing a computer program;

the main core is configured to implement any one of the method steps described in the first aspect of the embodiment of the present invention when executing the computer program stored on the storage medium, and determine a computing core that meets a preset operation condition;

each computing core is used for operating each layer to be operated of the neural network according to a preset operation sequence.

In a fourth aspect, an embodiment of the present invention provides a storage medium having stored therein a computer program which, when executed by a main core, implements the method steps of any of the first aspects of the embodiments of the present invention.

According to the neural network operation method, the neural network operation device and the heterogeneous intelligent chip, the neural network to be operated is obtained, network layer division is conducted on the neural network to obtain a plurality of layers to be operated of the neural network, and for each layer to be operated, computing cores meeting preset operation conditions are determined, so that each computing core operates each layer to be operated according to a preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a neural network operation method according to an embodiment of the invention;

FIG. 2 is a flow chart of a neural network operation method according to another embodiment of the present invention;

FIG. 3 is a split schematic diagram of a layer to be run according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a main core in a heterogeneous smart chip according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an execution flow of a scheduling module according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a neural network operation device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a neural network operation device according to another embodiment of the present invention;

fig. 8 is a schematic structural diagram of a heterogeneous smart chip according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to improve the operation efficiency of the neural network, the embodiment of the invention provides a neural network operation method, a neural network operation device and a heterogeneous intelligent chip. The following first describes a neural network operation method provided by the embodiment of the present invention.

The execution main body of the neural network operation method provided by the embodiment of the invention can be a heterogeneous intelligent chip for realizing functions such as image classification, voice recognition, target detection and the like, wherein a plurality of processor cores (such as a CPU (Central processing Unit), a GPU (graphics processing Unit) and the like) with different instruction set architectures are integrated in the heterogeneous intelligent chip, and at least comprise a main core with logic processing capability and a computing core supporting common neural network layer computation (for example, CNN (Convolutional Neural Network, convolutional neural network) capable of operating network layer computation directly through a configuration register, DSP (Digital Signal Processing ) supporting secondary programming development and the like). The mode of implementing the neural network operation method provided by the embodiment of the invention can be at least one mode of software, hardware circuits and logic circuits arranged in an execution main body.

As shown in fig. 1, the method for operating a neural network according to the embodiment of the present invention may include the following steps:

S101, acquiring a neural network to be operated.

The neural network is an artificial neural network, is the basis of machine learning and deep learning, and currently, the mainstream neural network mainly comprises CNN, RNN (Recurrent Neural Network, cyclic neural network) and DNN (Deep Neural Network ), and can be trained through end-to-end samples, so that the network has a certain function.

With the increase of application demands and the improvement of chip processing capability, application scenarios in which a plurality of neural networks (a plurality of neural networks or the same neural network is operated multiple times) are operated in the same device have become trend, that is, the neural networks to be operated may be plural.

Optionally, S101 may specifically be: and acquiring the neural network with the highest priority from the plurality of neural networks to be operated according to the preset network priority.

Because different neural networks realize different functions and have different real-time requirements on different neural networks, and the network priority attributes are different according to the different functions and the real-time requirements, the network priorities of the neural networks are preset for each neural network. The neural network with more important functions and higher real-time requirements has higher network priority, and the neural network with higher priority can be operated preferentially according to the preset network priority. Of course, the highest priority neural network may be more than one, and if the computational core of the heterogeneous smart chip allows multiple neural networks to be run, multiple neural networks may be run simultaneously in parallel. The preset network priority of the neural network is considered, so that different real-time requirements of different neural networks are met, and the method has higher adaptability.

However, since there is no correlation between different neural networks, if enough computation cores are included in the heterogeneous smart chip, a plurality of neural networks with different priorities may also be run in parallel and simultaneously, which is not particularly limited herein.

S102, dividing a network layer of the neural network to obtain a plurality of layers to be operated of the neural network.

The neural network includes a plurality of network layers (for example, convolution layer, pooling layer, etc.), and each network layer is used for obtaining the output of the previous network layer and performing operations such as convolution, pooling, etc. The operations are independent of each other for the network layers, so that the network layers can run on independent computational cores as long as the delivery of the activation amounts (data delivered between the network layers) in the order of execution of the neural network is guaranteed. This breaks up the vast neural network into network layers that can be quickly and efficiently run by the computing core.

When the neural network is divided into network layers, each network layer can be divided into one layer to be operated; of course, the multiple network layers related to the connected computation may be divided into one layer to be operated according to the complexity of the network layers, for example, there are two connected network layers, one is a convolution layer, the other is a pooling layer, and the computation of the two network layers may be divided into one layer to be operated, so as to reduce bandwidth and scheduling loss caused by switching between the layers to be operated.

Alternatively, S102 may specifically be: based on the number of network layers in the neural network, the neural network is divided to obtain a plurality of layers to be operated, the number of which is the same as that of the network layers in the neural network.

If the number of the computing cores in the heterogeneous intelligent chip is enough, each network layer in the neural network can be divided into one layer to be operated, for example, one neural network comprises 5 convolution layers and 2 pooling layers, and the number of the data of the divided layers to be operated is 7. Dividing each network layer into one layer to be operated can ensure that the operation amount of each layer to be operated is small enough, thereby ensuring the operation efficiency of the layer to be operated.

S103, determining the computing cores meeting the preset operation conditions for each layer to be operated, so that each computing core operates each layer to be operated according to the preset operation sequence.

After determining each computing core, each layer to be operated can be sent to the corresponding determined computing core, and then each computing core can operate the layer to be operated according to a preset operation sequence. Since the neural network is a directed acyclic graph structure, the input data of the subsequent network layer is generally dependent on the output data of the previous network layer, so that for a single neural network, it is required to operate in a layer-by-layer order according to a preset operation order.

The multiple layers to be run obtained by dividing in S102 are actually equivalent to task chains to be run, each layer to be run is a task node, and after obtaining the task chains, the idle condition of each computing core needs to be checked in real time. For each layer to be operated, a computing core meeting a preset operation condition is found from idle computing cores, wherein the preset operation condition is a condition allowing the layer to be operated, for example, a certain layer to be operated needs to perform a convolution operation of 5×5, and the determined computing core operating the layer to be operated at least needs to meet a condition capable of performing the convolution operation of 5×5, and the mentioned conditions are the operation performance of the computing core and may include an operation speed, an operation step length, a response time and the like.

If the number of the neural networks to be operated is multiple and the number of the calculation cores in the heterogeneous intelligent chip is enough, the multiple neural networks can be operated in parallel, and the parallel operation of the multiple neural networks can more fully utilize the hardware resources of the platform so as to further accelerate the operation of the neural networks.

Optionally, S103 may specifically be: judging whether the current layer to be operated is operated; if the current layer to be operated does not complete the operation, acquiring an operation rule of the current layer to be operated; determining a computing core having an operating condition corresponding to an operation rule based on the operation rule; and sending the current layer to be operated to the computing core so that the computing core operates the current layer to be operated.

If the current layer to be operated is operated, determining the next layer to be operated as the current layer to be operated according to a preset operation sequence, and executing the step of judging whether the current layer to be operated is operated.

And if each layer to be operated in the neural network is operated, outputting an operation result of the neural network.

Each network layer in the same neural network is executed in sequence layer by layer, therefore, when the computing cores of each layer to be operated are determined to be operated, each computing core can be determined one by one according to the preset operation sequence, and for one neural network, after all network layers are operated, the output result obtained by the last network layer is the operation result of the neural network. The operation rule of the current layer to be operated can be, for example, a rule such as a convolution size, an actual calculated amount and the like of convolution operation performed by the layer to be operated, the calculation space and the calculation capacity of the calculation core at least need to meet the calculation rule of the layer to be operated, the calculation core operates the layer to be operated by sending the layer to be operated to the calculation core, and after sending the layer to be operated, the current layer to be operated can be determined to be operated.

By circularly executing the steps, each layer to be operated in the neural network is ensured to finish operation in sequence, so that the operation of the neural network is realized, after each layer to be operated in the neural network is finished to operate, the operation result of the neural network can be actively output, or the operation result of the neural network can be passively output, for example, equipment waiting for the neural network to finish operation can be informed in a notification mode, and the equipment can actively acquire the output of the neural network.

By applying the embodiment, the neural network to be operated is obtained, and network layer division is performed on the neural network to obtain a plurality of layers to be operated of the neural network, and for each layer to be operated, the computing cores meeting the preset operation conditions are determined, so that each computing core operates each layer to be operated according to the preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.

Based on the embodiment shown in fig. 1, the embodiment of the invention also provides a neural network operation method, as shown in fig. 2, which comprises the following steps:

s201, acquiring a neural network to be operated.

S202, dividing a network layer of the neural network to obtain a plurality of layers to be operated of the neural network.

S201 and S202 are the same as S101 and S102 in the embodiment shown in fig. 1, and have the same or similar beneficial effects, and are not described here again.

S203, obtaining the calculated amount of each layer to be operated through analysis, and judging whether the calculated amount of each layer to be operated is larger than a preset threshold value.

S204, splitting the layer to be operated aiming at the layer to be operated with the calculated amount larger than the preset threshold value to obtain a plurality of sub-layers of the layer to be operated.

The calculation amount of part of the layers to be operated may be very large, and the heterogeneous intelligent chip may not contain a calculation core capable of operating the layers to be operated, so that the layers to be operated can be continuously split, the layers to be operated are split into a plurality of parallel sub-layers capable of being operated in parallel, and then a task node operated by the calculation core each time is one sub-layer. As shown in fig. 3, for a neural network with N network layers, after division and splitting, the network layer 0 is split into two sub-layers of network layer 0-0 and network layer 0-1, and the network layer 1 is split into three sub-layers of network layer 1-0, network layer 1-1 and network layer 1-2, which are the smallest task nodes. For the same layer to be operated, a plurality of sub-layers of the layer to be operated can be operated in parallel by the corresponding computing cores, so that the operation efficiency is further improved. The sub-layers may be split according to attributes such as data alignment characteristics that facilitate more efficient operation of the compute core. For the layer to be run of the undeployed molecular layer, the layer to be run is a sub-layer (i.e. a task node).

S205, it is determined whether the current layer to be run has completed running, if not, S206 to S208 are executed, and if yes, S209 is executed.

S206, acquiring operation rules of all non-operated sublayers in the current layer to be operated.

S207, based on each operation rule, a calculation core having an operation condition corresponding to the operation rule is determined.

S208, sending each non-running sub-layer to the corresponding computing core so that each computing core runs each non-running sub-layer.

S209, judging whether each layer to be operated in the neural network is operated, if not, executing S210, and if so, executing S211.

S210, determining the next layer to be operated as the current layer to be operated according to the preset operation sequence, and returning to execute S205.

S211, outputting an operation result of the neural network.

For the sub-layers, each sub-layer is the minimum task node for the operation of the computing core, so that the computing core capable of operating the sub-layer can be determined according to the operation rule of each sub-layer, the computing cores are operated by the corresponding computing cores respectively by sending the sub-layers to the corresponding computing cores, the computing cores can operate in parallel for the sub-layers of the same layer to be operated, and the computing cores operate in sequence for different layers to be operated of the same neural network, thereby greatly improving the operation efficiency of the neural network. And if the number of the computing cores in the heterogeneous intelligent chip is enough, the multiple neural networks can be executed in parallel, so that the operation efficiency of the multiple neural networks is ensured.

In addition, the sub-layer is used as the minimum task node to run on the heterogeneous intelligent chip, so that the operation of the neural network is more refined, the parallel operation capability of the neural network is improved, and the full utilization of hardware computing resources is possible; the split network tasks are scheduled in real time, so that the realization of multi-network parallelism has good performance and adaptability.

The core functions of the embodiment of the invention are concentrated on the main core of the heterogeneous intelligent chip, and the neural network operation method provided by the embodiment of the invention is described in detail below from the main structure of the main core and the functions of each module. Fig. 4 is a block diagram of a main core, which mainly includes a network parsing and layer splitting module 401, a scheduler module 402, and a computing core driving module 403.

The network parsing and layer splitting module 401 mainly performs the functions of parsing the neural network and splitting the network layer, wherein the input is the neural network, and the output is a network task chain which is split into sub-tasks. The aim of the embodiment of the invention is to realize the full utilization of the computing core, so task splitting is the key point of the embodiment of the invention, and the splitting targets and strategies mainly comprise: splitting the neural network computing task into smaller units than the network, wherein the finer degree of task splitting is more beneficial to the uniformity of multi-network parallelism; splitting a layer of computing tasks of a sequentially executing network into a plurality of parallel sub-tasks that can be executed in parallel, the single network also has a certain parallel execution capability; splitting the computing task into smaller units that are more suitable for execution by the computing core, e.g., the data of the computing task satisfies certain alignment characteristics, facilitates more efficient execution by the computing core.

To achieve the above three objectives, the splitting method of the network resolution and layer splitting module 401 is as follows: dividing the neural network in units of layers; the layer with too large calculation amount is continuously split into a plurality of parallel sub-layers suitable for execution of a calculation core; the task units that the computing core performs each time are sub-layers. A specific split example is shown in fig. 3, and will not be described here again.

The scheduler module 402 is connected with the network analysis and layer splitting module 401, and has the main functions of receiving the network task chains generated by the network analysis and layer splitting module 401, checking the idle condition of each computing core in real time, acquiring parallelizable task units from a plurality of network task chains, and issuing the parallelizable task units to a certain computing core for execution. The scheduler module 402 needs to maintain the running state of each network task chain and solve the problem of parallelizing task execution of multiple neural networks as much as possible. Meanwhile, because different neural networks have different execution priority, some neural networks need to be completed preferentially, the support of the sorting of the neural networks needs to be considered in the scheduling strategy, for example, the priority execution of tasks of the neural networks with high real-time requirements is guaranteed by increasing the network priority.

The scheduler module 402 actually executes the neural network and is responsible for task scheduling and management, and its performance determines the acceleration effect of the neural network operation method provided by the embodiment of the present invention, so that it should have a better scheduling policy algorithm. In the scheduling policy, mainly the following are considered: the network layers of the same neural network must be sequentially executed; multiple sub-layers of a certain network layer of the same neural network can be executed in different computing cores in parallel; the network layers of different neural networks may execute in parallel in different compute cores in parallel; the higher priority neural network tasks are executed preferentially.

As shown in fig. 5, the execution flow of the scheduler module 402 is given. Execution of the scheduler module 402 is one cycle: and continuously selecting a network from the multiple neural networks, selecting a task to be executed from the networks, and calling a computing core to execute. The detailed steps are as follows:

s501, selecting one network from a plurality of neural networks to be executed currently. The network is selected by considering the priority attribute of the network, and the network with high real-time requirement is preferentially selected.

S502, judging the state of the current layer to be operated of the network, if not, executing S503, if so, executing S505.

S503, selecting a non-executed sub-layer from the currently unfinished network layers.

S504, the idle state of the available computing cores is queried in real time, an appropriate computing core execution sub-layer is selected, and the execution returns to S501.

S505, judge whether all network layers of this network have finished, if finish, carry out S506, if not finish, carry out S507.

S506, if the network is completed, the program waiting for the network to be completed is informed, the network output can be obtained, and the execution returns to S501.

S507: the network is not completed and the next layer is selected for execution.

The computing core driver module 403 is connected to the scheduler module 402, and its main function is to provide the scheduler module 402 with a function of invoking a computing core, where each invocation of the computing core is contained in a minimum task unit.

According to the scheme, the main core splits the neural network into the layers and the sublayers suitable for execution of the computing cores, and the sublayers are used as minimum task units to be executed on the heterogeneous intelligent chip. By the method, the execution of the neural network is finer, and the parallel execution capacity of the neural network is improved: the network layer of the neural network is split into sub-layers which can be executed in parallel, and the network layers among the plurality of neural networks can be executed in parallel, so that the full utilization of hardware computing resources of the chip is possible.

And a scheduler module in the main core schedules the split network tasks in real time by managing the idle and busy of the computing core, and sorts the task scheduling by adopting a method of setting priority and the like. By the method, the parallel realization of the multiple neural networks has good performance, adaptability, priority and other methods, and the requirements of different neural networks on different real-time performance are met.

Corresponding to the above method embodiment, the present method embodiment provides a neural network operation device, as shown in fig. 6, including:

an acquiring module 610 is configured to acquire a neural network to be operated.

The network parsing module 620 is configured to perform network layer division on the neural network, so as to obtain a plurality of layers to be operated of the neural network.

The scheduler module 630 is configured to determine, for each layer to be executed, a computing core that meets a preset operation condition, so that each computing core executes each layer to be executed according to a preset operation sequence.

Optionally, the acquiring module 610 may specifically be configured to: and acquiring the neural network with the highest priority from the plurality of neural networks to be operated according to the preset network priority.

Optionally, the network parsing module 620 may specifically be configured to: and dividing the neural network based on the number of network layers in the neural network to obtain a plurality of layers to be operated.

Optionally, the scheduler module 630 may specifically be configured to: judging whether the current layer to be operated is operated; if the current layer to be operated does not finish operation, acquiring an operation rule of the current layer to be operated; determining a computing core having an operating condition corresponding to the operation rule based on the operation rule; and sending the current layer to be operated to the computing core so that the computing core operates the current layer to be operated.

Based on the embodiment shown in fig. 6, the embodiment of the present invention further provides a neural network operation device, as shown in fig. 7, including:

an acquisition module 710 is configured to acquire a neural network to be operated.

The network parsing module 720 is configured to perform network layer division on the neural network, so as to obtain a plurality of layers to be operated of the neural network.

The layer splitting module 730 is configured to obtain the calculated amount of each layer to be operated by analysis, and determine whether the calculated amount of each layer to be operated is greater than a preset threshold; and splitting the layer to be operated aiming at the layer to be operated with the calculated amount larger than the preset threshold value to obtain a plurality of sub-layers of the layer to be operated.

The scheduler module 740 is configured to determine, for each layer to be executed, a computing core that meets a preset operation condition, so that each computing core executes each layer to be executed according to a preset operation sequence.

Optionally, the scheduler module 740 may specifically be configured to: judging whether the current layer to be operated is operated; if the current layer to be operated does not finish operation, acquiring operation rules of all sub-layers which are not operated in the current layer to be operated; determining a computing core having an operation condition corresponding to each operation rule based on the operation rule; and sending each non-running sub-layer to each corresponding computing core so that each computing core runs each non-running sub-layer.

In order to improve the operation efficiency of the neural network, the embodiment of the present invention further provides a heterogeneous smart chip, as shown in fig. 8, including a main core 810, a plurality of computing cores 820, and a storage medium, wherein,

the storage medium is used for storing a computer program;

the main core 810 is configured to determine a computing core that meets a preset operation condition when executing the computer program stored on the storage medium, where all steps of the neural network operation method provided by the embodiment of the present invention are implemented;

each computing core 820 is configured to operate each layer to be operated of the neural network according to a preset operation sequence.

The multiple computing cores 820 may be processed in parallel or in series, and the multiple computing cores 820 may be the same, such as a CPU, or may be heterogeneous computing units, including any two or more of CPU, GPU, FPGA, ASIC.

The storage medium may include RAM (Random Access Memory ) or NVM (Non-Volatile Memory), such as at least one magnetic disk Memory. The storage medium may be a storage medium independent of the main core and the computing core in the heterogeneous smart chip, or may be a memory of the main core or the computing core.

The main core and the computing core may be general-purpose processors, including CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but also DSP (Digital Signal Processing, digital signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field-Programmable Gate Array, field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In this embodiment, the master core in the heterogeneous smart chip can implement by reading the computer program stored in the storage medium and running the computer program: the method comprises the steps of dividing a network layer of a neural network to be operated by obtaining the neural network to be operated, obtaining a plurality of layers to be operated of the neural network, and determining computing cores meeting preset operation conditions for each layer to be operated so that each computing core operates each layer to be operated according to a preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.

In addition, corresponding to the neural network operation method provided in the above embodiment, the embodiment of the present invention provides a storage medium, in which a computer program is stored, which when executed by a main core, implements all the steps of the neural network operation method provided in the embodiment of the present invention.

In this embodiment, the storage medium stores a computer program that executes the neural network operation method provided by the embodiment of the present invention at the time of operation, so that it is possible to realize: the method comprises the steps of dividing a network layer of a neural network to be operated by obtaining the neural network to be operated, obtaining a plurality of layers to be operated of the neural network, and determining computing cores meeting preset operation conditions for each layer to be operated so that each computing core operates each layer to be operated according to a preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.

For heterogeneous smart chips and storage medium embodiments, the description is relatively simple, as the method content involved is substantially similar to the method embodiments described above, and reference will only be made to part of the description of the method embodiments for relevant points.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device, heterogeneous smart chip and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, as relevant to see a section of the description of the method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A neural network operation method, wherein the method is used for a heterogeneous smart chip, and a plurality of processor cores of different instruction set architectures are integrated in the heterogeneous smart chip, at least comprising a plurality of computing cores integrated therein, and the method comprises:

acquiring a neural network to be operated;

dividing the neural network into network layers to obtain a plurality of layers to be operated of the neural network; according to the complexity of network layers, dividing a plurality of network layers related to connected computation into one layer to be operated, or dividing each network layer in the neural network into one layer to be operated;

Determining computing cores meeting preset operation conditions aiming at each layer to be operated so as to enable each computing core to operate each layer to be operated according to a preset operation sequence;

judging whether the current layer to be operated is operated;

if the current layer to be operated does not finish operation, acquiring an operation rule of the current layer to be operated; determining a computing core having an operating condition corresponding to the operation rule based on the operation rule; transmitting the current layer to be operated to the computing core so that the computing core operates the current layer to be operated, wherein the preset operation conditions comprise: calculation speed, calculation step length and response time;

after the network layer division is performed on the neural network to obtain a plurality of layers to be operated of the neural network, the method further comprises:

for the layer to be run with the calculated amount larger than the preset threshold value, splitting the layer to be operated into a plurality of parallel sublayers which operate in parallel according to the data alignment characteristic; a computation core of the heterogeneous intelligent chip correspondingly operates a sublayer;

judging whether the current layer to be operated is operated;

2. The method of claim 1, wherein the acquiring the neural network to be operated comprises:

3. The method of claim 1, wherein the performing network layer division on the neural network to obtain a plurality of layers to be run of the neural network comprises:

4. A neural network operation device, wherein heterogeneous smart chips are integrated in the device, and a plurality of processor cores with different instruction set architectures are integrated in the heterogeneous smart chips, and the device at least comprises a plurality of computing cores, and the device comprises:

The acquisition module is used for acquiring the neural network to be operated;

the network analysis module is used for carrying out network layer division on the neural network to obtain a plurality of layers to be operated of the neural network; according to the complexity of network layers, dividing a plurality of network layers related to connected computation into one layer to be operated, or dividing each network layer in the neural network into one layer to be operated;

the scheduler module is used for determining the computing cores meeting the preset operation conditions aiming at each layer to be operated so as to enable each computing core to operate each layer to be operated according to the preset operation sequence;

the scheduler module is specifically configured to:

judging whether the current layer to be operated is operated;

the layer splitting module is used for obtaining the calculated amount of each layer to be operated through analysis and judging whether the calculated amount of each layer to be operated is larger than a preset threshold value or not; aiming at a layer to be operated, the calculated amount of which is larger than the preset threshold value, splitting the layer to be operated into a plurality of parallel sublayers which are operated in parallel according to the data alignment characteristic; a computation core of the heterogeneous intelligent chip correspondingly operates a sublayer;

The scheduler module is specifically configured to:

judging whether the current layer to be operated is operated;

5. The apparatus of claim 4, wherein the obtaining module is specifically configured to:

6. The apparatus of claim 4, wherein the network resolution module is specifically configured to:

7. A heterogeneous intelligent chip is characterized by comprising a main core, a plurality of computing cores and a storage medium, wherein,

the storage medium is used for storing a computer program;

the main core, configured to implement the method steps of any one of claims 1 to 3 when executing the computer program stored on the storage medium, and determine a computing core that meets a preset operation condition;

8. A storage medium having stored therein a computer program which when executed by a main core implements the method steps of any of claims 1-3.