CN110717574B - Neural network operation method and device and heterogeneous intelligent chip - Google Patents

Neural network operation method and device and heterogeneous intelligent chip Download PDF

Info

Publication number
CN110717574B
CN110717574B CN201810757736.1A CN201810757736A CN110717574B CN 110717574 B CN110717574 B CN 110717574B CN 201810757736 A CN201810757736 A CN 201810757736A CN 110717574 B CN110717574 B CN 110717574B
Authority
CN
China
Prior art keywords
operated
layer
neural network
network
layers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810757736.1A
Other languages
Chinese (zh)
Other versions
CN110717574A (en
Inventor
丁健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201810757736.1A priority Critical patent/CN110717574B/en
Publication of CN110717574A publication Critical patent/CN110717574A/en
Application granted granted Critical
Publication of CN110717574B publication Critical patent/CN110717574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a neural network operation method, a device and a heterogeneous intelligent chip, wherein the neural network operation method comprises the following steps: acquiring a neural network to be operated; dividing a network layer of the neural network to obtain a plurality of layers to be operated of the neural network; and determining the computing cores meeting the preset operation conditions aiming at each layer to be operated so as to enable each computing core to operate each layer to be operated according to the preset operation sequence. Through the scheme, the operation efficiency of the neural network can be improved.

Description

Neural network operation method and device and heterogeneous intelligent chip
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a neural network operation method and apparatus, and a heterogeneous intelligent chip.
Background
The neural network is used as the basis of machine learning and deep learning, analyzes data by simulating a mechanism of a human brain, and is an intelligent model for analysis learning by building and simulating the human brain. Currently, neural networks have become a mainstream application method in image classification, target detection, target tracking, voice recognition, and the like.
With the continuous development of deep learning technology, the network structure of the neural network is more and more huge, the number of network layers is more and more, and the requirement of high operation efficiency of the neural network cannot be met by the traditional hardware platform. In order to cope with the problem, in the related neural network operation method, the operation amount of each network layer is reduced by optimizing each network layer in the neural network, or the two operations of data carrying and calculation of each network layer are parallel, so that the time consumption of the operation of the whole neural network is shortened, and the operation efficiency of the neural network is further improved.
However, with the development of the neural network, the operation of the network layer in the neural network is more and more complex, and the optimization of the network layer is often based on a known operation mechanism, so that the operation efficiency of the neural network improved by performing the optimization processing on the network layer is actually limited, and the improvement effect of the operation efficiency is poor.
Disclosure of Invention
The embodiment of the invention aims to provide a neural network operation method and device and a heterogeneous intelligent chip so as to improve the operation efficiency of the neural network. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a neural network operation method, where the method includes:
acquiring a neural network to be operated;
dividing the neural network into network layers to obtain a plurality of layers to be operated of the neural network;
and determining the computing cores meeting the preset operation conditions aiming at each layer to be operated so as to enable each computing core to operate each layer to be operated according to the preset operation sequence.
Optionally, the acquiring the neural network to be operated includes:
and acquiring the neural network with the highest priority from the plurality of neural networks to be operated according to the preset network priority.
Optionally, the performing network layer division on the neural network to obtain a plurality of layers to be operated of the neural network includes:
And dividing the neural network based on the number of network layers in the neural network to obtain a plurality of layers to be operated.
Optionally, the determining, for each layer to be run, a computing core that meets a preset running condition includes:
judging whether the current layer to be operated is operated;
if the current layer to be operated does not finish operation, acquiring an operation rule of the current layer to be operated; determining a computing core having an operating condition corresponding to the operation rule based on the operation rule; and sending the current layer to be operated to the computing core so that the computing core operates the current layer to be operated.
Optionally, after the network layer division is performed on the neural network to obtain a plurality of layers to be operated of the neural network, the method further includes:
obtaining the calculated amount of each layer to be operated through analysis, and judging whether the calculated amount of each layer to be operated is larger than a preset threshold value or not;
splitting the layer to be operated aiming at the layer to be operated with the calculated amount larger than the preset threshold value to obtain a plurality of sub-layers of the layer to be operated;
the determining, for each layer to be operated, a computing core that satisfies a preset operation condition includes:
Judging whether the current layer to be operated is operated;
if the current layer to be operated does not finish operation, acquiring operation rules of all sub-layers which are not operated in the current layer to be operated; determining a computing core having an operation condition corresponding to each operation rule based on the operation rule; and sending each non-running sub-layer to each corresponding computing core so that each computing core runs each non-running sub-layer.
In a second aspect, an embodiment of the present invention provides a neural network operating device, including:
the acquisition module is used for acquiring the neural network to be operated;
the network analysis module is used for carrying out network layer division on the neural network to obtain a plurality of layers to be operated of the neural network;
the scheduler module is used for determining the computing cores meeting the preset operation conditions aiming at each layer to be operated so as to enable each computing core to operate each layer to be operated according to the preset operation sequence.
Optionally, the acquiring module is specifically configured to:
and acquiring the neural network with the highest priority from the plurality of neural networks to be operated according to the preset network priority.
Optionally, the network parsing module is specifically configured to:
and dividing the neural network based on the number of network layers in the neural network to obtain a plurality of layers to be operated.
Optionally, the scheduler module is specifically configured to:
judging whether the current layer to be operated is operated;
if the current layer to be operated does not finish operation, acquiring an operation rule of the current layer to be operated; determining a computing core having an operating condition corresponding to the operation rule based on the operation rule; and sending the current layer to be operated to the computing core so that the computing core operates the current layer to be operated.
Optionally, the apparatus further includes:
the layer splitting module is used for obtaining the calculated amount of each layer to be operated through analysis and judging whether the calculated amount of each layer to be operated is larger than a preset threshold value or not; splitting the layer to be operated aiming at the layer to be operated with the calculated amount larger than the preset threshold value to obtain a plurality of sub-layers of the layer to be operated;
the scheduler module is specifically configured to:
judging whether the current layer to be operated is operated;
if the current layer to be operated does not finish operation, acquiring operation rules of all sub-layers which are not operated in the current layer to be operated; determining a computing core having an operation condition corresponding to each operation rule based on the operation rule; and sending each non-running sub-layer to each corresponding computing core so that each computing core runs each non-running sub-layer.
In a third aspect, an embodiment of the present invention provides a heterogeneous smart chip, including a main core, a plurality of computing cores, and a storage medium, where,
the storage medium is used for storing a computer program;
the main core is configured to implement any one of the method steps described in the first aspect of the embodiment of the present invention when executing the computer program stored on the storage medium, and determine a computing core that meets a preset operation condition;
each computing core is used for operating each layer to be operated of the neural network according to a preset operation sequence.
In a fourth aspect, an embodiment of the present invention provides a storage medium having stored therein a computer program which, when executed by a main core, implements the method steps of any of the first aspects of the embodiments of the present invention.
According to the neural network operation method, the neural network operation device and the heterogeneous intelligent chip, the neural network to be operated is obtained, network layer division is conducted on the neural network to obtain a plurality of layers to be operated of the neural network, and for each layer to be operated, computing cores meeting preset operation conditions are determined, so that each computing core operates each layer to be operated according to a preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a neural network operation method according to an embodiment of the invention;
FIG. 2 is a flow chart of a neural network operation method according to another embodiment of the present invention;
FIG. 3 is a split schematic diagram of a layer to be run according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a main core in a heterogeneous smart chip according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an execution flow of a scheduling module according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a neural network operation device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a neural network operation device according to another embodiment of the present invention;
fig. 8 is a schematic structural diagram of a heterogeneous smart chip according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to improve the operation efficiency of the neural network, the embodiment of the invention provides a neural network operation method, a neural network operation device and a heterogeneous intelligent chip. The following first describes a neural network operation method provided by the embodiment of the present invention.
The execution main body of the neural network operation method provided by the embodiment of the invention can be a heterogeneous intelligent chip for realizing functions such as image classification, voice recognition, target detection and the like, wherein a plurality of processor cores (such as a CPU (Central processing Unit), a GPU (graphics processing Unit) and the like) with different instruction set architectures are integrated in the heterogeneous intelligent chip, and at least comprise a main core with logic processing capability and a computing core supporting common neural network layer computation (for example, CNN (Convolutional Neural Network, convolutional neural network) capable of operating network layer computation directly through a configuration register, DSP (Digital Signal Processing ) supporting secondary programming development and the like). The mode of implementing the neural network operation method provided by the embodiment of the invention can be at least one mode of software, hardware circuits and logic circuits arranged in an execution main body.
As shown in fig. 1, the method for operating a neural network according to the embodiment of the present invention may include the following steps:
S101, acquiring a neural network to be operated.
The neural network is an artificial neural network, is the basis of machine learning and deep learning, and currently, the mainstream neural network mainly comprises CNN, RNN (Recurrent Neural Network, cyclic neural network) and DNN (Deep Neural Network ), and can be trained through end-to-end samples, so that the network has a certain function.
With the increase of application demands and the improvement of chip processing capability, application scenarios in which a plurality of neural networks (a plurality of neural networks or the same neural network is operated multiple times) are operated in the same device have become trend, that is, the neural networks to be operated may be plural.
Optionally, S101 may specifically be: and acquiring the neural network with the highest priority from the plurality of neural networks to be operated according to the preset network priority.
Because different neural networks realize different functions and have different real-time requirements on different neural networks, and the network priority attributes are different according to the different functions and the real-time requirements, the network priorities of the neural networks are preset for each neural network. The neural network with more important functions and higher real-time requirements has higher network priority, and the neural network with higher priority can be operated preferentially according to the preset network priority. Of course, the highest priority neural network may be more than one, and if the computational core of the heterogeneous smart chip allows multiple neural networks to be run, multiple neural networks may be run simultaneously in parallel. The preset network priority of the neural network is considered, so that different real-time requirements of different neural networks are met, and the method has higher adaptability.
However, since there is no correlation between different neural networks, if enough computation cores are included in the heterogeneous smart chip, a plurality of neural networks with different priorities may also be run in parallel and simultaneously, which is not particularly limited herein.
S102, dividing a network layer of the neural network to obtain a plurality of layers to be operated of the neural network.
The neural network includes a plurality of network layers (for example, convolution layer, pooling layer, etc.), and each network layer is used for obtaining the output of the previous network layer and performing operations such as convolution, pooling, etc. The operations are independent of each other for the network layers, so that the network layers can run on independent computational cores as long as the delivery of the activation amounts (data delivered between the network layers) in the order of execution of the neural network is guaranteed. This breaks up the vast neural network into network layers that can be quickly and efficiently run by the computing core.
When the neural network is divided into network layers, each network layer can be divided into one layer to be operated; of course, the multiple network layers related to the connected computation may be divided into one layer to be operated according to the complexity of the network layers, for example, there are two connected network layers, one is a convolution layer, the other is a pooling layer, and the computation of the two network layers may be divided into one layer to be operated, so as to reduce bandwidth and scheduling loss caused by switching between the layers to be operated.
Alternatively, S102 may specifically be: based on the number of network layers in the neural network, the neural network is divided to obtain a plurality of layers to be operated, the number of which is the same as that of the network layers in the neural network.
If the number of the computing cores in the heterogeneous intelligent chip is enough, each network layer in the neural network can be divided into one layer to be operated, for example, one neural network comprises 5 convolution layers and 2 pooling layers, and the number of the data of the divided layers to be operated is 7. Dividing each network layer into one layer to be operated can ensure that the operation amount of each layer to be operated is small enough, thereby ensuring the operation efficiency of the layer to be operated.
S103, determining the computing cores meeting the preset operation conditions for each layer to be operated, so that each computing core operates each layer to be operated according to the preset operation sequence.
After determining each computing core, each layer to be operated can be sent to the corresponding determined computing core, and then each computing core can operate the layer to be operated according to a preset operation sequence. Since the neural network is a directed acyclic graph structure, the input data of the subsequent network layer is generally dependent on the output data of the previous network layer, so that for a single neural network, it is required to operate in a layer-by-layer order according to a preset operation order.
The multiple layers to be run obtained by dividing in S102 are actually equivalent to task chains to be run, each layer to be run is a task node, and after obtaining the task chains, the idle condition of each computing core needs to be checked in real time. For each layer to be operated, a computing core meeting a preset operation condition is found from idle computing cores, wherein the preset operation condition is a condition allowing the layer to be operated, for example, a certain layer to be operated needs to perform a convolution operation of 5×5, and the determined computing core operating the layer to be operated at least needs to meet a condition capable of performing the convolution operation of 5×5, and the mentioned conditions are the operation performance of the computing core and may include an operation speed, an operation step length, a response time and the like.
If the number of the neural networks to be operated is multiple and the number of the calculation cores in the heterogeneous intelligent chip is enough, the multiple neural networks can be operated in parallel, and the parallel operation of the multiple neural networks can more fully utilize the hardware resources of the platform so as to further accelerate the operation of the neural networks.
Optionally, S103 may specifically be: judging whether the current layer to be operated is operated; if the current layer to be operated does not complete the operation, acquiring an operation rule of the current layer to be operated; determining a computing core having an operating condition corresponding to an operation rule based on the operation rule; and sending the current layer to be operated to the computing core so that the computing core operates the current layer to be operated.
If the current layer to be operated is operated, determining the next layer to be operated as the current layer to be operated according to a preset operation sequence, and executing the step of judging whether the current layer to be operated is operated.
And if each layer to be operated in the neural network is operated, outputting an operation result of the neural network.
Each network layer in the same neural network is executed in sequence layer by layer, therefore, when the computing cores of each layer to be operated are determined to be operated, each computing core can be determined one by one according to the preset operation sequence, and for one neural network, after all network layers are operated, the output result obtained by the last network layer is the operation result of the neural network. The operation rule of the current layer to be operated can be, for example, a rule such as a convolution size, an actual calculated amount and the like of convolution operation performed by the layer to be operated, the calculation space and the calculation capacity of the calculation core at least need to meet the calculation rule of the layer to be operated, the calculation core operates the layer to be operated by sending the layer to be operated to the calculation core, and after sending the layer to be operated, the current layer to be operated can be determined to be operated.
By circularly executing the steps, each layer to be operated in the neural network is ensured to finish operation in sequence, so that the operation of the neural network is realized, after each layer to be operated in the neural network is finished to operate, the operation result of the neural network can be actively output, or the operation result of the neural network can be passively output, for example, equipment waiting for the neural network to finish operation can be informed in a notification mode, and the equipment can actively acquire the output of the neural network.
By applying the embodiment, the neural network to be operated is obtained, and network layer division is performed on the neural network to obtain a plurality of layers to be operated of the neural network, and for each layer to be operated, the computing cores meeting the preset operation conditions are determined, so that each computing core operates each layer to be operated according to the preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.
Based on the embodiment shown in fig. 1, the embodiment of the invention also provides a neural network operation method, as shown in fig. 2, which comprises the following steps:
s201, acquiring a neural network to be operated.
S202, dividing a network layer of the neural network to obtain a plurality of layers to be operated of the neural network.
S201 and S202 are the same as S101 and S102 in the embodiment shown in fig. 1, and have the same or similar beneficial effects, and are not described here again.
S203, obtaining the calculated amount of each layer to be operated through analysis, and judging whether the calculated amount of each layer to be operated is larger than a preset threshold value.
S204, splitting the layer to be operated aiming at the layer to be operated with the calculated amount larger than the preset threshold value to obtain a plurality of sub-layers of the layer to be operated.
The calculation amount of part of the layers to be operated may be very large, and the heterogeneous intelligent chip may not contain a calculation core capable of operating the layers to be operated, so that the layers to be operated can be continuously split, the layers to be operated are split into a plurality of parallel sub-layers capable of being operated in parallel, and then a task node operated by the calculation core each time is one sub-layer. As shown in fig. 3, for a neural network with N network layers, after division and splitting, the network layer 0 is split into two sub-layers of network layer 0-0 and network layer 0-1, and the network layer 1 is split into three sub-layers of network layer 1-0, network layer 1-1 and network layer 1-2, which are the smallest task nodes. For the same layer to be operated, a plurality of sub-layers of the layer to be operated can be operated in parallel by the corresponding computing cores, so that the operation efficiency is further improved. The sub-layers may be split according to attributes such as data alignment characteristics that facilitate more efficient operation of the compute core. For the layer to be run of the undeployed molecular layer, the layer to be run is a sub-layer (i.e. a task node).
S205, it is determined whether the current layer to be run has completed running, if not, S206 to S208 are executed, and if yes, S209 is executed.
S206, acquiring operation rules of all non-operated sublayers in the current layer to be operated.
S207, based on each operation rule, a calculation core having an operation condition corresponding to the operation rule is determined.
S208, sending each non-running sub-layer to the corresponding computing core so that each computing core runs each non-running sub-layer.
S209, judging whether each layer to be operated in the neural network is operated, if not, executing S210, and if so, executing S211.
S210, determining the next layer to be operated as the current layer to be operated according to the preset operation sequence, and returning to execute S205.
S211, outputting an operation result of the neural network.
For the sub-layers, each sub-layer is the minimum task node for the operation of the computing core, so that the computing core capable of operating the sub-layer can be determined according to the operation rule of each sub-layer, the computing cores are operated by the corresponding computing cores respectively by sending the sub-layers to the corresponding computing cores, the computing cores can operate in parallel for the sub-layers of the same layer to be operated, and the computing cores operate in sequence for different layers to be operated of the same neural network, thereby greatly improving the operation efficiency of the neural network. And if the number of the computing cores in the heterogeneous intelligent chip is enough, the multiple neural networks can be executed in parallel, so that the operation efficiency of the multiple neural networks is ensured.
By applying the embodiment, the neural network to be operated is obtained, and network layer division is performed on the neural network to obtain a plurality of layers to be operated of the neural network, and for each layer to be operated, the computing cores meeting the preset operation conditions are determined, so that each computing core operates each layer to be operated according to the preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.
In addition, the sub-layer is used as the minimum task node to run on the heterogeneous intelligent chip, so that the operation of the neural network is more refined, the parallel operation capability of the neural network is improved, and the full utilization of hardware computing resources is possible; the split network tasks are scheduled in real time, so that the realization of multi-network parallelism has good performance and adaptability.
The core functions of the embodiment of the invention are concentrated on the main core of the heterogeneous intelligent chip, and the neural network operation method provided by the embodiment of the invention is described in detail below from the main structure of the main core and the functions of each module. Fig. 4 is a block diagram of a main core, which mainly includes a network parsing and layer splitting module 401, a scheduler module 402, and a computing core driving module 403.
The network parsing and layer splitting module 401 mainly performs the functions of parsing the neural network and splitting the network layer, wherein the input is the neural network, and the output is a network task chain which is split into sub-tasks. The aim of the embodiment of the invention is to realize the full utilization of the computing core, so task splitting is the key point of the embodiment of the invention, and the splitting targets and strategies mainly comprise: splitting the neural network computing task into smaller units than the network, wherein the finer degree of task splitting is more beneficial to the uniformity of multi-network parallelism; splitting a layer of computing tasks of a sequentially executing network into a plurality of parallel sub-tasks that can be executed in parallel, the single network also has a certain parallel execution capability; splitting the computing task into smaller units that are more suitable for execution by the computing core, e.g., the data of the computing task satisfies certain alignment characteristics, facilitates more efficient execution by the computing core.
To achieve the above three objectives, the splitting method of the network resolution and layer splitting module 401 is as follows: dividing the neural network in units of layers; the layer with too large calculation amount is continuously split into a plurality of parallel sub-layers suitable for execution of a calculation core; the task units that the computing core performs each time are sub-layers. A specific split example is shown in fig. 3, and will not be described here again.
The scheduler module 402 is connected with the network analysis and layer splitting module 401, and has the main functions of receiving the network task chains generated by the network analysis and layer splitting module 401, checking the idle condition of each computing core in real time, acquiring parallelizable task units from a plurality of network task chains, and issuing the parallelizable task units to a certain computing core for execution. The scheduler module 402 needs to maintain the running state of each network task chain and solve the problem of parallelizing task execution of multiple neural networks as much as possible. Meanwhile, because different neural networks have different execution priority, some neural networks need to be completed preferentially, the support of the sorting of the neural networks needs to be considered in the scheduling strategy, for example, the priority execution of tasks of the neural networks with high real-time requirements is guaranteed by increasing the network priority.
The scheduler module 402 actually executes the neural network and is responsible for task scheduling and management, and its performance determines the acceleration effect of the neural network operation method provided by the embodiment of the present invention, so that it should have a better scheduling policy algorithm. In the scheduling policy, mainly the following are considered: the network layers of the same neural network must be sequentially executed; multiple sub-layers of a certain network layer of the same neural network can be executed in different computing cores in parallel; the network layers of different neural networks may execute in parallel in different compute cores in parallel; the higher priority neural network tasks are executed preferentially.
As shown in fig. 5, the execution flow of the scheduler module 402 is given. Execution of the scheduler module 402 is one cycle: and continuously selecting a network from the multiple neural networks, selecting a task to be executed from the networks, and calling a computing core to execute. The detailed steps are as follows:
s501, selecting one network from a plurality of neural networks to be executed currently. The network is selected by considering the priority attribute of the network, and the network with high real-time requirement is preferentially selected.
S502, judging the state of the current layer to be operated of the network, if not, executing S503, if so, executing S505.
S503, selecting a non-executed sub-layer from the currently unfinished network layers.
S504, the idle state of the available computing cores is queried in real time, an appropriate computing core execution sub-layer is selected, and the execution returns to S501.
S505, judge whether all network layers of this network have finished, if finish, carry out S506, if not finish, carry out S507.
S506, if the network is completed, the program waiting for the network to be completed is informed, the network output can be obtained, and the execution returns to S501.
S507: the network is not completed and the next layer is selected for execution.
The computing core driver module 403 is connected to the scheduler module 402, and its main function is to provide the scheduler module 402 with a function of invoking a computing core, where each invocation of the computing core is contained in a minimum task unit.
According to the scheme, the main core splits the neural network into the layers and the sublayers suitable for execution of the computing cores, and the sublayers are used as minimum task units to be executed on the heterogeneous intelligent chip. By the method, the execution of the neural network is finer, and the parallel execution capacity of the neural network is improved: the network layer of the neural network is split into sub-layers which can be executed in parallel, and the network layers among the plurality of neural networks can be executed in parallel, so that the full utilization of hardware computing resources of the chip is possible.
And a scheduler module in the main core schedules the split network tasks in real time by managing the idle and busy of the computing core, and sorts the task scheduling by adopting a method of setting priority and the like. By the method, the parallel realization of the multiple neural networks has good performance, adaptability, priority and other methods, and the requirements of different neural networks on different real-time performance are met.
Corresponding to the above method embodiment, the present method embodiment provides a neural network operation device, as shown in fig. 6, including:
an acquiring module 610 is configured to acquire a neural network to be operated.
The network parsing module 620 is configured to perform network layer division on the neural network, so as to obtain a plurality of layers to be operated of the neural network.
The scheduler module 630 is configured to determine, for each layer to be executed, a computing core that meets a preset operation condition, so that each computing core executes each layer to be executed according to a preset operation sequence.
Optionally, the acquiring module 610 may specifically be configured to: and acquiring the neural network with the highest priority from the plurality of neural networks to be operated according to the preset network priority.
Optionally, the network parsing module 620 may specifically be configured to: and dividing the neural network based on the number of network layers in the neural network to obtain a plurality of layers to be operated.
Optionally, the scheduler module 630 may specifically be configured to: judging whether the current layer to be operated is operated; if the current layer to be operated does not finish operation, acquiring an operation rule of the current layer to be operated; determining a computing core having an operating condition corresponding to the operation rule based on the operation rule; and sending the current layer to be operated to the computing core so that the computing core operates the current layer to be operated.
By applying the embodiment, the neural network to be operated is obtained, and network layer division is performed on the neural network to obtain a plurality of layers to be operated of the neural network, and for each layer to be operated, the computing cores meeting the preset operation conditions are determined, so that each computing core operates each layer to be operated according to the preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.
Based on the embodiment shown in fig. 6, the embodiment of the present invention further provides a neural network operation device, as shown in fig. 7, including:
an acquisition module 710 is configured to acquire a neural network to be operated.
The network parsing module 720 is configured to perform network layer division on the neural network, so as to obtain a plurality of layers to be operated of the neural network.
The layer splitting module 730 is configured to obtain the calculated amount of each layer to be operated by analysis, and determine whether the calculated amount of each layer to be operated is greater than a preset threshold; and splitting the layer to be operated aiming at the layer to be operated with the calculated amount larger than the preset threshold value to obtain a plurality of sub-layers of the layer to be operated.
The scheduler module 740 is configured to determine, for each layer to be executed, a computing core that meets a preset operation condition, so that each computing core executes each layer to be executed according to a preset operation sequence.
Optionally, the scheduler module 740 may specifically be configured to: judging whether the current layer to be operated is operated; if the current layer to be operated does not finish operation, acquiring operation rules of all sub-layers which are not operated in the current layer to be operated; determining a computing core having an operation condition corresponding to each operation rule based on the operation rule; and sending each non-running sub-layer to each corresponding computing core so that each computing core runs each non-running sub-layer.
By applying the embodiment, the neural network to be operated is obtained, and network layer division is performed on the neural network to obtain a plurality of layers to be operated of the neural network, and for each layer to be operated, the computing cores meeting the preset operation conditions are determined, so that each computing core operates each layer to be operated according to the preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.
In addition, the sub-layer is used as the minimum task node to run on the heterogeneous intelligent chip, so that the operation of the neural network is more refined, the parallel operation capability of the neural network is improved, and the full utilization of hardware computing resources is possible; the split network tasks are scheduled in real time, so that the realization of multi-network parallelism has good performance and adaptability.
In order to improve the operation efficiency of the neural network, the embodiment of the present invention further provides a heterogeneous smart chip, as shown in fig. 8, including a main core 810, a plurality of computing cores 820, and a storage medium, wherein,
the storage medium is used for storing a computer program;
the main core 810 is configured to determine a computing core that meets a preset operation condition when executing the computer program stored on the storage medium, where all steps of the neural network operation method provided by the embodiment of the present invention are implemented;
each computing core 820 is configured to operate each layer to be operated of the neural network according to a preset operation sequence.
The multiple computing cores 820 may be processed in parallel or in series, and the multiple computing cores 820 may be the same, such as a CPU, or may be heterogeneous computing units, including any two or more of CPU, GPU, FPGA, ASIC.
The storage medium may include RAM (Random Access Memory ) or NVM (Non-Volatile Memory), such as at least one magnetic disk Memory. The storage medium may be a storage medium independent of the main core and the computing core in the heterogeneous smart chip, or may be a memory of the main core or the computing core.
The main core and the computing core may be general-purpose processors, including CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but also DSP (Digital Signal Processing, digital signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field-Programmable Gate Array, field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In this embodiment, the master core in the heterogeneous smart chip can implement by reading the computer program stored in the storage medium and running the computer program: the method comprises the steps of dividing a network layer of a neural network to be operated by obtaining the neural network to be operated, obtaining a plurality of layers to be operated of the neural network, and determining computing cores meeting preset operation conditions for each layer to be operated so that each computing core operates each layer to be operated according to a preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.
In addition, corresponding to the neural network operation method provided in the above embodiment, the embodiment of the present invention provides a storage medium, in which a computer program is stored, which when executed by a main core, implements all the steps of the neural network operation method provided in the embodiment of the present invention.
In this embodiment, the storage medium stores a computer program that executes the neural network operation method provided by the embodiment of the present invention at the time of operation, so that it is possible to realize: the method comprises the steps of dividing a network layer of a neural network to be operated by obtaining the neural network to be operated, obtaining a plurality of layers to be operated of the neural network, and determining computing cores meeting preset operation conditions for each layer to be operated so that each computing core operates each layer to be operated according to a preset operation sequence. As more and more chip manufacturers push heterogeneous intelligent chips supporting the operation of the neural network, one or more computing cores are generally integrated in the heterogeneous intelligent chips, the neural network is divided into a plurality of layers to be operated which are easier to operate by dividing the network layer into the neural network, and the computing cores meeting preset operation conditions are allocated to each layer to be operated, so that the hardware computing capacity of the heterogeneous intelligent chips is fully utilized, and the operation efficiency of the whole neural network is improved.
For heterogeneous smart chips and storage medium embodiments, the description is relatively simple, as the method content involved is substantially similar to the method embodiments described above, and reference will only be made to part of the description of the method embodiments for relevant points.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device, heterogeneous smart chip and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, as relevant to see a section of the description of the method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (8)

1. A neural network operation method, wherein the method is used for a heterogeneous smart chip, and a plurality of processor cores of different instruction set architectures are integrated in the heterogeneous smart chip, at least comprising a plurality of computing cores integrated therein, and the method comprises:
acquiring a neural network to be operated;
dividing the neural network into network layers to obtain a plurality of layers to be operated of the neural network; according to the complexity of network layers, dividing a plurality of network layers related to connected computation into one layer to be operated, or dividing each network layer in the neural network into one layer to be operated;
Determining computing cores meeting preset operation conditions aiming at each layer to be operated so as to enable each computing core to operate each layer to be operated according to a preset operation sequence;
the determining, for each layer to be operated, a computing core that satisfies a preset operation condition includes:
judging whether the current layer to be operated is operated;
if the current layer to be operated does not finish operation, acquiring an operation rule of the current layer to be operated; determining a computing core having an operating condition corresponding to the operation rule based on the operation rule; transmitting the current layer to be operated to the computing core so that the computing core operates the current layer to be operated, wherein the preset operation conditions comprise: calculation speed, calculation step length and response time;
after the network layer division is performed on the neural network to obtain a plurality of layers to be operated of the neural network, the method further comprises:
obtaining the calculated amount of each layer to be operated through analysis, and judging whether the calculated amount of each layer to be operated is larger than a preset threshold value or not;
for the layer to be run with the calculated amount larger than the preset threshold value, splitting the layer to be operated into a plurality of parallel sublayers which operate in parallel according to the data alignment characteristic; a computation core of the heterogeneous intelligent chip correspondingly operates a sublayer;
The determining, for each layer to be operated, a computing core that satisfies a preset operation condition includes:
judging whether the current layer to be operated is operated;
if the current layer to be operated does not finish operation, acquiring operation rules of all sub-layers which are not operated in the current layer to be operated; determining a computing core having an operation condition corresponding to each operation rule based on the operation rule; and sending each non-running sub-layer to each corresponding computing core so that each computing core runs each non-running sub-layer.
2. The method of claim 1, wherein the acquiring the neural network to be operated comprises:
and acquiring the neural network with the highest priority from the plurality of neural networks to be operated according to the preset network priority.
3. The method of claim 1, wherein the performing network layer division on the neural network to obtain a plurality of layers to be run of the neural network comprises:
and dividing the neural network based on the number of network layers in the neural network to obtain a plurality of layers to be operated.
4. A neural network operation device, wherein heterogeneous smart chips are integrated in the device, and a plurality of processor cores with different instruction set architectures are integrated in the heterogeneous smart chips, and the device at least comprises a plurality of computing cores, and the device comprises:
The acquisition module is used for acquiring the neural network to be operated;
the network analysis module is used for carrying out network layer division on the neural network to obtain a plurality of layers to be operated of the neural network; according to the complexity of network layers, dividing a plurality of network layers related to connected computation into one layer to be operated, or dividing each network layer in the neural network into one layer to be operated;
the scheduler module is used for determining the computing cores meeting the preset operation conditions aiming at each layer to be operated so as to enable each computing core to operate each layer to be operated according to the preset operation sequence;
the scheduler module is specifically configured to:
judging whether the current layer to be operated is operated;
if the current layer to be operated does not finish operation, acquiring an operation rule of the current layer to be operated; determining a computing core having an operating condition corresponding to the operation rule based on the operation rule; transmitting the current layer to be operated to the computing core so that the computing core operates the current layer to be operated, wherein the preset operation conditions comprise: calculation speed, calculation step length and response time;
the layer splitting module is used for obtaining the calculated amount of each layer to be operated through analysis and judging whether the calculated amount of each layer to be operated is larger than a preset threshold value or not; aiming at a layer to be operated, the calculated amount of which is larger than the preset threshold value, splitting the layer to be operated into a plurality of parallel sublayers which are operated in parallel according to the data alignment characteristic; a computation core of the heterogeneous intelligent chip correspondingly operates a sublayer;
The scheduler module is specifically configured to:
judging whether the current layer to be operated is operated;
if the current layer to be operated does not finish operation, acquiring operation rules of all sub-layers which are not operated in the current layer to be operated; determining a computing core having an operation condition corresponding to each operation rule based on the operation rule; and sending each non-running sub-layer to each corresponding computing core so that each computing core runs each non-running sub-layer.
5. The apparatus of claim 4, wherein the obtaining module is specifically configured to:
and acquiring the neural network with the highest priority from the plurality of neural networks to be operated according to the preset network priority.
6. The apparatus of claim 4, wherein the network resolution module is specifically configured to:
and dividing the neural network based on the number of network layers in the neural network to obtain a plurality of layers to be operated.
7. A heterogeneous intelligent chip is characterized by comprising a main core, a plurality of computing cores and a storage medium, wherein,
the storage medium is used for storing a computer program;
the main core, configured to implement the method steps of any one of claims 1 to 3 when executing the computer program stored on the storage medium, and determine a computing core that meets a preset operation condition;
Each computing core is used for operating each layer to be operated of the neural network according to a preset operation sequence.
8. A storage medium having stored therein a computer program which when executed by a main core implements the method steps of any of claims 1-3.
CN201810757736.1A 2018-07-11 2018-07-11 Neural network operation method and device and heterogeneous intelligent chip Active CN110717574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810757736.1A CN110717574B (en) 2018-07-11 2018-07-11 Neural network operation method and device and heterogeneous intelligent chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810757736.1A CN110717574B (en) 2018-07-11 2018-07-11 Neural network operation method and device and heterogeneous intelligent chip

Publications (2)

Publication Number Publication Date
CN110717574A CN110717574A (en) 2020-01-21
CN110717574B true CN110717574B (en) 2023-07-07

Family

ID=69208951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810757736.1A Active CN110717574B (en) 2018-07-11 2018-07-11 Neural network operation method and device and heterogeneous intelligent chip

Country Status (1)

Country Link
CN (1) CN110717574B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488970A (en) * 2020-04-03 2020-08-04 北京思朗科技有限责任公司 Execution optimization method and device of neural network
CN111860810A (en) * 2020-06-30 2020-10-30 浪潮(北京)电子信息产业有限公司 Neural network operation method, device and equipment based on FPGA
CN111737193B (en) * 2020-08-03 2020-12-08 深圳鲲云信息科技有限公司 Data storage method, device, equipment and storage medium
CN111985634B (en) * 2020-08-21 2024-06-14 北京灵汐科技有限公司 Operation method and device of neural network, computer equipment and storage medium
CN111814967B (en) * 2020-09-11 2021-02-23 鹏城实验室 Method, apparatus and storage medium for calculating inferential computation of neural network model
CN113158243A (en) * 2021-04-16 2021-07-23 苏州大学 Distributed image recognition model reasoning method and system
CN114647610B (en) * 2022-02-17 2022-11-29 北京百度网讯科技有限公司 Voice chip implementation method, voice chip and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956658A (en) * 2016-04-29 2016-09-21 北京比特大陆科技有限公司 Data processing method, data processing device and chip
CN107463990A (en) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 A kind of FPGA parallel acceleration methods of convolutional neural networks
CN108090565A (en) * 2018-01-16 2018-05-29 电子科技大学 Accelerated method is trained in a kind of convolutional neural networks parallelization
CN108171117A (en) * 2017-12-05 2018-06-15 南京南瑞信息通信科技有限公司 Electric power artificial intelligence visual analysis system based on multinuclear heterogeneous Computing

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779410B (en) * 2012-07-19 2014-08-06 杭州师范大学 Parallel implementation method of multi-source heterogeneous traffic data fusion
CN105607955A (en) * 2015-12-23 2016-05-25 浪潮集团有限公司 Calculation task distribution method and apparatus
CN107341545A (en) * 2017-07-25 2017-11-10 郑州云海信息技术有限公司 A kind of deep neural network arithmetic system and method
CN108228969A (en) * 2017-12-07 2018-06-29 中国航空工业集团公司西安航空计算技术研究所 A kind of double FPGA collaboration working method towards deep neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956658A (en) * 2016-04-29 2016-09-21 北京比特大陆科技有限公司 Data processing method, data processing device and chip
CN107463990A (en) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 A kind of FPGA parallel acceleration methods of convolutional neural networks
CN108171117A (en) * 2017-12-05 2018-06-15 南京南瑞信息通信科技有限公司 Electric power artificial intelligence visual analysis system based on multinuclear heterogeneous Computing
CN108090565A (en) * 2018-01-16 2018-05-29 电子科技大学 Accelerated method is trained in a kind of convolutional neural networks parallelization

Also Published As

Publication number Publication date
CN110717574A (en) 2020-01-21

Similar Documents

Publication Publication Date Title
CN110717574B (en) Neural network operation method and device and heterogeneous intelligent chip
CN113254178B (en) Task scheduling method and device, electronic equipment and readable storage medium
US20200249998A1 (en) Scheduling computation graph heterogeneous computer system
CN110490309B (en) Operator fusion method for neural network and related product thereof
CN112711478B (en) Task processing method and device based on neural network, server and storage medium
CN110826708B (en) Method for realizing neural network model splitting by using multi-core processor and related product
US11609792B2 (en) Maximizing resource utilization of neural network computing system
CN112328380A (en) Task scheduling method and device based on heterogeneous computing
US20200184366A1 (en) Scheduling task graph operations
CN110689121A (en) Method for realizing neural network model splitting by using multi-core processor and related product
US20220414503A1 (en) Slo-aware artificial intelligence inference scheduler for heterogeneous processors in edge platforms
CN110308982A (en) A kind of shared drive multiplexing method and device
CN117271101B (en) Operator fusion method and device, electronic equipment and storage medium
KR20220016859A (en) Method and apparatus for scheduling matrix jobs in digital processing system
CN112817730A (en) Deep neural network service batch processing scheduling method and system and GPU
Maruf et al. Extending resources for avoiding overloads of mixed‐criticality tasks in cyber‐physical systems
CN110928666B (en) Method and system for optimizing task parallelism based on memory in Spark environment
CN113127173B (en) Heterogeneous sensing cluster scheduling method and device
CN114662932A (en) Node-hierarchical workflow timing task scheduling method
US20210390405A1 (en) Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof
KR20210023401A (en) Neural network computing method and system including the computing method
CN111985634B (en) Operation method and device of neural network, computer equipment and storage medium
Maste et al. Intelligent dynamic time quantum allocation in mlfq scheduling
Zhang et al. A locally distributed mobile computing framework for DNN based android applications
Yang et al. Study on static task scheduling based on heterogeneous multi-core processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant