WO2020164469A1 - Neural network calculation method and apparatus, mobile terminal and storage medium - Google Patents

Neural network calculation method and apparatus, mobile terminal and storage medium Download PDF

Info

Publication number
WO2020164469A1
WO2020164469A1 PCT/CN2020/074719 CN2020074719W WO2020164469A1 WO 2020164469 A1 WO2020164469 A1 WO 2020164469A1 CN 2020074719 W CN2020074719 W CN 2020074719W WO 2020164469 A1 WO2020164469 A1 WO 2020164469A1
Authority
WO
WIPO (PCT)
Prior art keywords
operators
operator
executed
neural network
operator sets
Prior art date
Application number
PCT/CN2020/074719
Other languages
French (fr)
Chinese (zh)
Inventor
刘耀勇
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020164469A1 publication Critical patent/WO2020164469A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A neural network calculation method and apparatus, a mobile terminal and a storage medium, the method comprising: a mobile terminal acquiring M operators to be executed, and calculating the dependency between the M operators to be executed, wherein N is an integer greater than or equal to two (101); the mobile terminal cutting the M operators to be executed according to the dependency between the M operators to be executed so as to obtain N operator sets, each operator set among the N operator sets comprising at least one operator, and N being an integer greater than or equal to two (102); and if the N operator sets are operator sets that are independent of one other, the mobile terminal enabling N threads to respectively calculate the operators in the N operator sets (103). The described method may reduce the inference time of the neural network.

Description

神经网络计算方法、装置、移动终端及存储介质Neural network calculation method, device, mobile terminal and storage medium 技术领域Technical field
本申请涉及通信技术领域,具体涉及一种神经网络计算方法、装置、移动终端及存储介质。This application relates to the field of communication technology, and in particular to a neural network calculation method, device, mobile terminal and storage medium.
背景技术Background technique
目前的神经网络算法框架(比如,Tensorflow Lite)中,在进行神经网络计算时,将所有需要执行的算子加入到一个待执行的队列中,然后处理器依次调用执行这些算子,也就是在一个线程中顺序执行完这些算子。随着神经网络越来越复杂,算子数量越来越多,也会导致神经网络的推理时间变长。In the current neural network algorithm framework (for example, Tensorflow Lite), when performing neural network calculations, all operators that need to be executed are added to a queue to be executed, and then the processor calls and executes these operators in turn, that is, These operators are executed sequentially in a thread. As the neural network becomes more and more complex and the number of operators increases, the reasoning time of the neural network will become longer.
发明内容Summary of the invention
本申请实施例提供了一种神经网络计算方法、装置、移动终端及存储介质,可以降低神经网络的推理时间。The embodiments of the present application provide a neural network calculation method, device, mobile terminal, and storage medium, which can reduce the reasoning time of the neural network.
第一方面,本申请实施例提供一种基于神经网络算法框架的神经网络计算方法,包括:In the first aspect, an embodiment of the present application provides a neural network calculation method based on a neural network algorithm framework, including:
获取M个待执行算子,计算所述M个待执行算子之间的依赖关系,N为大于或等于2的整数;Acquiring M to-be-executed operators, and calculating the dependency relationship between the M to-be-executed operators, where N is an integer greater than or equal to 2;
依据所述M个待执行算子之间的依赖关系对所述M个待执行算子进行切割,得到N个算子集合,所述N个算子集合中的每个算子集合至少包括1个算子,N为大于或等于2的整数;Cut the M to-be-executed operators according to the dependency relationship between the M to-be-executed operators to obtain N operator sets, each of the N operator sets includes at least 1 Operators, N is an integer greater than or equal to 2;
若所述N个算子集合为相互独立的算子集合,启用N个线程分别对所述N个算子集合中的算子进行计算。If the N operator sets are mutually independent operator sets, N threads are activated to perform calculations on the operators in the N operator sets respectively.
第二方面,本申请实施例提供了一种神经网络计算装置,所述神经网络计算装置包括通信单元和处理单元,其中:In a second aspect, an embodiment of the present application provides a neural network computing device. The neural network computing device includes a communication unit and a processing unit, wherein:
所述通信单元,用于获取M个待执行算子;The communication unit is used to obtain M operators to be executed;
所述处理单元,用于计算所述M个待执行算子之间的依赖关系,N为大于或等于2的整数;以及用于依据所述M个待执行算子之间的依赖关系对所述M个待执行算子进行切割,得到N个算子集合,所述N个算子集合中的每个算子集合至少包括1个算子,N为大于或等于2的整数;以及用于在所述N个算子集合为相互独立的算子集合的情况下,启用N个线程分别对所述N个算子集合中的算子进行计算。The processing unit is configured to calculate the dependency relationship between the M to-be-executed operators, where N is an integer greater than or equal to 2; The M to-be-executed operators are cut to obtain N operator sets, each of the N operator sets includes at least one operator, and N is an integer greater than or equal to 2; and In a case where the N operator sets are mutually independent operator sets, N threads are activated to perform calculations on the operators in the N operator sets respectively.
第三方面,本申请实施例提供一种移动终端,包括处理器、存储器,所述存储器用于存储一个或多个程序,所述一个或多个程序被配置成由所述处理器执行,上述程序包括用于执行本申请实施例第一方面中的步骤的指令。In a third aspect, an embodiment of the present application provides a mobile terminal, including a processor and a memory, the memory is used to store one or more programs, and the one or more programs are configured to be executed by the processor. The program includes instructions for executing the steps in the first aspect of the embodiments of the present application.
第四方面,本申请实施例提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,上述计算机程序使得计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the foregoing computer-readable storage medium stores a computer program for electronic data exchange, wherein the foregoing computer program enables a computer to execute Some or all of the steps described in one aspect.
第五方面,本申请实施例提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。In a fifth aspect, embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute Example part or all of the steps described in the first aspect. The computer program product may be a software installation package.
可以看出,本申请实施例中所描述的基于神经网络算法框架的神经网络计算方法,在进行神经网络计算时,获取M个待执行算子,计算M个待执行算子之间的依赖关系,N为大于或等于2的整数;依据M个待执行算子之间的依赖关系对M个待执行算子进行切 割,得到N个算子集合,N个算子集合中的每个算子集合至少包括1个算子,N为大于或等于2的整数;若N个算子集合为相互独立的算子集合,启用N个线程分别对N个算子集合中的算子进行计算。本申请实施例可以对待执行的算子进行切割,当切割得到的N个算子集合是相互独立的算子集合,启用N个线程分别对N个算子集合中的算子进行计算,可以启用N个线程同时对N个算子集合中的算子分别进行计算,可以提高神经网络计算的速度,从而降低神经网络的推理时间。It can be seen that the neural network calculation method based on the neural network algorithm framework described in the embodiments of this application obtains M to-be-executed operators when performing neural network calculations, and calculates the dependency between the M to-be-executed operators , N is an integer greater than or equal to 2; cut the M operators to be executed according to the dependency between the M operators to be executed, and obtain N operator sets, and each operator in the N operator sets The set includes at least one operator, and N is an integer greater than or equal to 2; if the N operator sets are mutually independent operator sets, N threads are enabled to perform calculations on the operators in the N operator sets. The embodiment of the application can cut the operator to be executed. When the N operator sets obtained by the cut are mutually independent operator sets, N threads are enabled to perform calculations on the operators in the N operator sets. N threads perform calculations on the operators in the N operator sets at the same time, which can increase the calculation speed of the neural network, thereby reducing the reasoning time of the neural network.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1是本申请实施例公开的一种基于神经网络算法框架的神经网络计算方法的流程示意图;FIG. 1 is a schematic flowchart of a neural network calculation method based on a neural network algorithm framework disclosed in an embodiment of the present application;
图2是本申请实施例公开的一种算子之间的依赖关系示意图;FIG. 2 is a schematic diagram of a dependency relationship between operators disclosed in an embodiment of the present application;
图3是本申请实施例公开的另一种基于神经网络算法框架的神经网络计算方法的流程示意图;3 is a schematic flowchart of another neural network calculation method based on a neural network algorithm framework disclosed in an embodiment of the present application;
图4是本申请实施例公开的一种神经网络计算装置的结构示意图;Fig. 4 is a schematic structural diagram of a neural network computing device disclosed in an embodiment of the present application;
图5是本申请实施例公开的一种移动终端的结构示意图。Fig. 5 is a schematic structural diagram of a mobile terminal disclosed in an embodiment of the present application.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本发明方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second", etc. in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a specific feature, structure or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present invention. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
本申请实施例所涉及到的移动终端可以包括各种具有无线通信功能的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(User Equipment,UE),移动台(Mobile Station,MS),终端设备(terminal device)等等。为方便描述,上面提到的设备统称为移动终端。The mobile terminals involved in the embodiments of this application may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal device (terminal device), etc. For ease of description, the devices mentioned above are collectively referred to as mobile terminals.
下面对本申请实施例进行详细介绍。The following describes the embodiments of the present application in detail.
请参阅图1,图1是本申请实施例公开的一种基于神经网络算法框架的神经网络计算方法的流程示意图,如图1所示,该基于神经网络算法框架的神经网络计算方法包括如下步骤。Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a neural network calculation method based on a neural network algorithm framework disclosed in an embodiment of the present application. As shown in FIG. 1, the neural network calculation method based on a neural network algorithm framework includes the following steps .
101,移动终端获取M个待执行算子,计算M个待执行算子之间的依赖关系,N为大于或等于2的整数。101. The mobile terminal obtains M to-be-executed operators, and calculates the dependency relationship between the M to-be-executed operators, where N is an integer greater than or equal to 2.
本申请实施例中,神经网络算法框架可以为TensorFlow或TensorFlow Lite。其中,TensorFlow是一个在个人计算机(personal computer,PC)端运行的用于训练以及运行神经网络模型的框架。TensorFlow Lite是一个在移动端运行的用于训练以及运行神经网络模型的框架,该移动端可以运行IOS系统或安卓(Android)系统。In the embodiment of this application, the neural network algorithm framework may be TensorFlow or TensorFlow Lite. Among them, TensorFlow is a framework for training and running neural network models that runs on a personal computer (PC). TensorFlow Lite is a framework for training and running neural network models that runs on the mobile terminal. The mobile terminal can run IOS system or Android system.
神经网络算法框架可以包括控制器单元、运算单元和存储单元。控制器单元用于存储指令与处理指令。运算单元用于对算子进行计算,存储单元用于存储神经元、权值等。算子,是operator的缩写,在神经网络模型中,一个operator代表一种计算,比如加减乘除,就是4个operator。在神经网络模型中,进行神经网络推理时,需要对多个算子进行计算,目前的所有的算子都是串行执行的,导致神经网络的推理时间较长。The neural network algorithm framework can include a controller unit, a computing unit, and a storage unit. The controller unit is used to store instructions and processing instructions. The arithmetic unit is used to calculate the operator, and the storage unit is used to store neurons, weights, etc. Operator is the abbreviation of operator. In the neural network model, an operator represents a calculation, such as addition, subtraction, multiplication, and division, which means 4 operators. In the neural network model, when performing neural network inference, multiple operators need to be calculated, and all the current operators are executed serially, which leads to a long inference time of the neural network.
本申请实施例中,进行神经网络推理时,需要对多个算子进行计算。控制器单元获取M个待执行算子后,计算M个待执行算子之间的依赖关系。M个待执行算子可以是整个神经网络推理过程中需要执行的算子,也可以是某一层神经网络计算过程中需要执行的算子,也可以是某一层神经网络计算过程中部分需要执行的算子。In the embodiment of the present application, when performing neural network inference, multiple operators need to be calculated. After acquiring the M operators to be executed, the controller unit calculates the dependency relationship among the M operators to be executed. The M to-be-executed operators can be operators that need to be executed during the entire neural network inference process, they can also be operators that need to be executed during the calculation of a certain layer of neural network, or they can be part of the need for a certain layer of neural network calculation. Operator to perform.
本申请实施例中的算子可以包括Conv2D算子、FusedBatchNorm算子、Relu算子、DepthwiseConv2dNative算子、MaxPool算子、BiasAdd算子、ConcatV2算子等。The operators in the embodiments of this application may include Conv2D operators, FusedBatchNorm operators, Relu operators, DepthwiseConv2dNative operators, MaxPool operators, BiasAdd operators, ConcatV2 operators, and so on.
Conv2D算子是将给定的四维输入数据与四维滤波器张量(filter tensor)计算一个二维卷积,四维滤波器张量也可称为四维卷积核张量。Conv2D算子规定了四维输入数据包括训练样本(batch)的数量、输入数据的高度(inputHeight),输入数据的宽度(inputWidth),输入数据的通道数(inputChannel)。四维滤波器张量包括滤波器高度(filterHeight),滤波器宽度(filterWidth),滤波器通道数(filterChannel),滤波器数量(filterNumber)。Conv2D算子将四维滤波器张量按照一定的步长(strides)在四维输入数据中进行滑动乘加运算,以得到二维卷积结果。The Conv2D operator calculates a two-dimensional convolution from a given four-dimensional input data and a four-dimensional filter tensor. The four-dimensional filter tensor can also be called a four-dimensional convolution kernel tensor. The Conv2D operator specifies the four-dimensional input data including the number of training samples (batch), the height of the input data (inputHeight), the width of the input data (inputWidth), and the number of channels of the input data (inputChannel). The four-dimensional filter tensor includes filter height (filterHeight), filter width (filterWidth), filter channel number (filterChannel), filter number (filterNumber). The Conv2D operator performs sliding multiplication and addition operations on the four-dimensional input data according to a certain strides of the four-dimensional filter tensor to obtain a two-dimensional convolution result.
FusedBatchNorm算子是深度神经网络中经常用到的加速神经网络训练的算子,可以加速收敛速度及稳定性,是目前深度神经网络必不可少的组成部分。FusedBatchNorm operator is an operator that is often used in deep neural networks to accelerate neural network training. It can accelerate the convergence speed and stability, and is an indispensable part of deep neural networks.
Relu算子,也称为ReLU函数,其代表的的是“修正线性单元”,它是带有卷积图像的输入x的最大函数(x,o)。ReLU算子将矩阵x内所有负值都设为零,其余的值不变,ReLU函数的算子是在卷积运算之后进行的。The Relu operator, also known as the ReLU function, represents the "modified linear unit", which is the maximum function (x, o) of the input x with the convolutional image. The ReLU operator sets all negative values in the matrix x to zero, and the rest of the values remain unchanged. The operator of the ReLU function is performed after the convolution operation.
DepthwiseConv2dNative算子,将给定的四维输入数据与四维滤波器张量(filter tensor)计算一个二维卷积,四维滤波器张量也可称为四维卷积核张量。Conv2D算子规定了四维输入数据包括训练样本(batch)的数量、输入数据的高度(inputHeight),输入数据的宽度(inputWidth),输入数据的通道数(inputChannel)。四维滤波器张量包括滤波器高度(filterHeight),滤波器宽度(filterWidth),滤波器通道数(filterChannel),输出乘子(channel_multiplier)。Conv2D算子将四维滤波器张量按照一定的步长(strides)在四维输入数据中进行滑动乘加运算,以得到二维卷积结果。The DepthwiseConv2dNative operator calculates a two-dimensional convolution of the given four-dimensional input data and a four-dimensional filter tensor. The four-dimensional filter tensor can also be called a four-dimensional convolution kernel tensor. The Conv2D operator specifies the four-dimensional input data including the number of training samples (batch), the height of the input data (inputHeight), the width of the input data (inputWidth), and the number of channels of the input data (inputChannel). The four-dimensional filter tensor includes filter height (filterHeight), filter width (filterWidth), filter channel number (filterChannel), output multiplier (channel_multiplier). The Conv2D operator performs sliding multiplication and addition operations on the four-dimensional input data according to a certain strides of the four-dimensional filter tensor to obtain a two-dimensional convolution result.
MaxPool算子,是池化算子的一种,将卷积运算结果中的部分数据丢弃的一种算法。The MaxPool operator is a type of pooling operator that discards part of the data in the convolution operation result.
BiasAdd算子,是偏置算子,是将一个叫bias的向量加到一个叫value的矩阵,是向量与矩阵的每一行进行相加,得到的结果和value矩阵大小相同。BiasAdd算子执行的是加法运算。The BiasAdd operator is a bias operator. It adds a vector called bias to a matrix called value. The vector is added to each row of the matrix, and the result is the same size as the value matrix. The BiasAdd operator performs addition operations.
ConcatV2算子,是连接两个矩阵的操作,用于将两个矩阵合并,合并后的矩阵,行或列会增加。The ConcatV2 operator is an operation that connects two matrices. It is used to merge the two matrices. The rows or columns of the merged matrix will increase.
其中,不同的算子之间,可能会存在相互依赖的关系,比如Conv2D算子执行之后,才能执行激活算子、池化算子、归一化算子等。移动终端可以根据每个算子之间的先后执行顺序确定每个算子之间的依赖关系。Among them, different operators may have interdependent relationships. For example, after the Conv2D operator is executed, the activation operator, pooling operator, and normalization operator can be executed. The mobile terminal can determine the dependency relationship between each operator according to the sequential execution order of each operator.
举例来说,请参阅图2,图2是本申请实施例公开的一种算子之间的依赖关系示意图。 如图2所示,如果需要执行的算子有8个,分别为第一算子、第二算子、第三算子、第四算子、第五算子、第六算子、第七算子、第八算子。其中,执行完第一算子之后,才可以执行第二算子、第五算子,执行完第二算子之后,才可以执行第三算子,执行完第三算子之后,才可以执行第四算子;执行完第五算子之后,才可以执行第六算子,执行完第六算子之后,才可以执行第七算子;执行完第四算子和第七算子之后,才可以执行第八算子。从图2可以看出,第一算子、第二算子、第三算子、第四算子、第八算子之间具有依赖关系,第一算子、第五算子、第六算子、第七算子、第八算子之间具有依赖关系。第二算子、第三算子、第四算子与第五算子、第六算子、第七算子之间为相互独立的关系,二者之间并没有严格的先后执行顺序。For example, please refer to FIG. 2, which is a schematic diagram of a dependency relationship between operators disclosed in an embodiment of the present application. As shown in Figure 2, if there are 8 operators to be executed, they are the first operator, the second operator, the third operator, the fourth operator, the fifth operator, the sixth operator, and the seventh operator. Operator, the eighth operator. Among them, the second operator and the fifth operator can be executed after the first operator is executed, the third operator can be executed after the second operator is executed, and the third operator can be executed after the third operator is executed The fourth operator; the sixth operator can be executed after the fifth operator is executed, and the seventh operator can be executed after the sixth operator is executed; after the fourth operator and the seventh operator are executed, Only then can the eighth operator be executed. It can be seen from Figure 2 that the first operator, the second operator, the third operator, the fourth operator, and the eighth operator have a dependency relationship. The first operator, the fifth operator, and the sixth operator There are dependencies among sub, seventh and eighth operators. The second operator, the third operator, the fourth operator and the fifth operator, the sixth operator, and the seventh operator are mutually independent, and there is no strict order of execution between the two.
102,移动终端依据M个待执行算子之间的依赖关系对M个待执行算子进行切割,得到N个算子集合,N个算子集合中的每个算子集合至少包括1个算子,N为大于或等于2的整数。102. The mobile terminal cuts the M to-be-executed operators according to the dependency between the M to-be-executed operators to obtain N operator sets, and each of the N operator sets includes at least 1 operator. Sub, N is an integer greater than or equal to 2.
本申请实施例中,移动终端可以依据M个待执行算子之间的依赖关系,按照一定的切割算法对M个待执行算子进行切割,得到N个算子集合,以最大化的降低N个算子集合之间的依赖性,让N个算子集合之间的尽可能多的算子集合能够相互独立。以图2为例可以将8个待执行算子切割成4个算子集合,其中,第一算子集合包括第一算子,第二算子集合包括第二算子、第三算子、第四算子,第三算子集合包括第五算子、第六算子、第七算子,第四算子集合包括第八算子,其中,第一算子集合与第二算子集合、第三算子集合之间存在依赖关系,第四算子集合与第二算子集合、第三算子集合之间存在依赖关系,第二算子集合与第三算子集合之间相互独立。In the embodiment of the present application, the mobile terminal can cut the M operators to be executed according to a certain cutting algorithm according to the dependency relationship between the M operators to be executed, and obtain N operator sets to minimize N The dependence between the operator sets allows as many operator sets as possible among the N operator sets to be independent of each other. Taking Figure 2 as an example, the 8 operators to be executed can be cut into 4 operator sets, where the first operator set includes the first operator, and the second operator set includes the second operator, the third operator, The fourth operator, the third operator set includes the fifth operator, the sixth operator, and the seventh operator, and the fourth operator set includes the eighth operator. Among them, the first operator set and the second operator set , There is a dependency relationship between the third operator set, the fourth operator set, the second operator set, and the third operator set have a dependency relationship, and the second operator set and the third operator set are mutually independent .
103,若N个算子集合为相互独立的算子集合,移动终端启用N个线程分别对N个算子集合中的算子进行计算。103. If the N operator sets are mutually independent operator sets, the mobile terminal activates N threads to calculate operators in the N operator sets respectively.
本申请实施例中,如果N个算子集合为相互独立的算子集合,则表明N个算子集合之间没有依赖关系,没有哪个算子集合需要在另外一个算子集合之前执行,则移动终端可以启用N个线程分别对N个算子集合中的算子进行计算,从而可以启用N个线程同时对N个算子集合中的算子分别进行计算,可以提高神经网络计算的速度,从而降低神经网络的推理时间。In the embodiment of this application, if the N operator sets are mutually independent operator sets, it indicates that there is no dependency between the N operator sets, and no operator set needs to be executed before another operator set, then move The terminal can enable N threads to perform calculations on the operators in the N operator sets, so that N threads can be enabled to perform calculations on the operators in the N operator sets at the same time, which can increase the speed of neural network calculations. Reduce the reasoning time of the neural network.
可选的,步骤102可以包括如下步骤:Optionally, step 102 may include the following steps:
移动终端依据M个待执行算子之间的依赖关系,采用图剖分算法对M个待执行算子进行切割,得到N个算子集合。According to the dependency relationship between the M operators to be executed, the mobile terminal uses a graph splitting algorithm to cut the M operators to be executed to obtain a set of N operators.
采用图剖分算法,可以准确的将有向图进行划分,使得N个算子集合之间的依赖性尽可能的小,从而提高可并行执行的算子集合的数量,进而提高算子计算速度。Using graph partitioning algorithm, the directed graph can be divided accurately, so that the dependence between N operator sets is as small as possible, thereby increasing the number of operator sets that can be executed in parallel, thereby increasing the speed of operator calculation .
可选的,在执行步骤101之后,还可以执行如下步骤:Optionally, after performing step 101, the following steps may be performed:
移动终端依据M个待执行算子之间的依赖关系得到M个待执行算子之间的有向图。The mobile terminal obtains a directed graph between the M operators to be executed according to the dependency relationship between the M operators to be executed.
移动终端依据M个待执行算子之间的依赖关系,采用图剖分算法对M个待执行算子进行切割,得到N个算子集合,具体为包括:According to the dependency relationship between the M operators to be executed, the mobile terminal uses the graph split algorithm to cut the M operators to be executed to obtain a set of N operators, which specifically includes:
移动终端依据M个待执行算子之间的依赖关系,采用图剖分算法对M个待执行算子之间的有向图进行切割,得到N个有向子图;其中,每个有向子图对应一个算子集合。According to the dependency relationship between the M operators to be executed, the mobile terminal adopts the graph splitting algorithm to cut the directed graph among the M operators to be executed to obtain N directed subgraphs; among them, each directed graph The subgraph corresponds to a set of operators.
图2所示的依赖关系示意图也可以称为有向图,其中,图2所示的矩形框表示算子,矩形框之间的连接线表示依赖关系。图2所示的矩形框可以抽象为有向图的点,连接线可以抽象为有向图的边。连接线的终点(箭头的终点)必须在连接线的起点(箭头的起点)计算完之后才能开始计算。有向图可以直观的体现算子之间的依赖关系,有利于后续对算子集合进行划分。The dependency relationship schematic diagram shown in FIG. 2 may also be called a directed graph, where the rectangular boxes shown in FIG. 2 represent operators, and the connecting lines between the rectangular boxes represent dependency relationships. The rectangular box shown in Figure 2 can be abstracted as the points of the directed graph, and the connecting lines can be abstracted as the edges of the directed graph. The end point of the connecting line (the end of the arrow) must be calculated after the start of the connecting line (the beginning of the arrow) can be calculated. The directed graph can intuitively reflect the dependency relationship between operators, which is beneficial to the subsequent division of the operator set.
以图2为例,移动终端依据8个待执行算子之间的依赖关系,采用图剖分算法对8个 待执行算子之间的有向图进行切割,具体为:将有向图的第一节点与第二节点和第五节点切割,将有向图的第八节点与第四节点和第七节点切割,从而切割成4个有向子图。其中,有向图的第一节点、第二节点、第三节点、第四节点、第五节点、第六节点、第七节点、第八节点分别对应第一算子、第二算子、第三算子、第四算子、第五算子、第六算子、第七算子、第八算子。4个有向子图分别为第一有向子图、第二有向子图、第三有向子图和第四有向子图。第一有向子图仅包括有向图的第一节点;第二有向子图包括第二节点、第三节点、第四节点、第二节点与第三节点之间的连接线、第三节点与第四节点之间的连接线;第三有向子图包括第五节点、第六节点、第七节点、第五节点与第六节点之间的连接线、第六节点与第七节点之间的连接线;第四有向子图仅包括有向图的第八节点。其中,第一有向子图与第二有向子图和第三有向子图存在依赖关系,第四有向子图与第二有向子图和第三有向子图存在依赖关系,第二有向子图与第三有向子图相互独立。Taking Figure 2 as an example, the mobile terminal uses a graph splitting algorithm to cut the directed graph between the 8 operators to be executed according to the dependency relationship between the 8 operators to be executed, specifically: The first node is cut from the second node and the fifth node, and the eighth node of the directed graph is cut from the fourth node and the seventh node, thereby cutting into 4 directed subgraphs. Among them, the first node, the second node, the third node, the fourth node, the fifth node, the sixth node, the seventh node, and the eighth node of the directed graph correspond to the first operator, the second operator, and the first operator, respectively. Three operators, fourth operators, fifth operators, sixth operators, seventh operators, and eighth operators. The four directed subgraphs are the first directed subgraph, the second directed subgraph, the third directed subgraph, and the fourth directed subgraph. The first directed subgraph includes only the first node of the directed graph; the second directed subgraph includes the second node, the third node, the fourth node, the connecting line between the second node and the third node, and the third node. The connection line between the node and the fourth node; the third directed subgraph includes the fifth node, the sixth node, the seventh node, the connection line between the fifth node and the sixth node, the sixth node and the seventh node The connecting line between; the fourth directed subgraph includes only the eighth node of the directed graph. Among them, the first directed subgraph has a dependency relationship with the second directed subgraph and the third directed subgraph, and the fourth directed subgraph has a dependency relation with the second directed subgraph and the third directed subgraph. The second directed subgraph and the third directed subgraph are independent of each other.
本申请实施例中,首先对神经网络模型的推理过程中需要执行的算子的依赖关系进行计算,根据依赖关系对待执行的算子进行切割,当切割得到的N个算子集合是相互独立的算子集合时,启用N个线程分别对N个算子集合中的算子进行计算,可以启用N个线程同时对N个算子集合中的算子分别进行计算,可以提高神经网络计算的速度,从而降低神经网络的推理时间。In the embodiment of the present application, the dependency relationship of the operators that need to be executed in the inference process of the neural network model is first calculated, and the operators to be executed are cut according to the dependency relationship. When the set of N operators obtained by the cut are independent of each other When the operator set is enabled, N threads are enabled to perform calculations on the operators in the N operator sets, and N threads can be enabled to calculate the operators in the N operator sets at the same time, which can improve the speed of neural network calculations , Thereby reducing the reasoning time of the neural network.
请参阅图3,图3是本申请实施例公开的另一种基于神经网络算法框架的神经网络计算方法的流程示意图,图3是在图1的基础上进一步优化得到的。如图3所示,该基于神经网络算法框架的神经网络计算方法包括如下步骤。Please refer to FIG. 3. FIG. 3 is a schematic flowchart of another neural network calculation method based on a neural network algorithm framework disclosed in an embodiment of the present application. FIG. 3 is further optimized on the basis of FIG. 1. As shown in Figure 3, the neural network calculation method based on the neural network algorithm framework includes the following steps.
301,移动终端获取M个待执行算子,计算M个待执行算子之间的依赖关系,N为大于或等于2的整数。301. The mobile terminal obtains M to-be-executed operators, and calculates the dependency between the M to-be-executed operators, where N is an integer greater than or equal to 2.
302,移动终端依据M个待执行算子之间的依赖关系对M个待执行算子进行切割,得到N个算子集合,N个算子集合中的每个算子集合至少包括1个算子,N为大于或等于2的整数。302. The mobile terminal cuts the M to-be-executed operators according to the dependency between the M to-be-executed operators to obtain N operator sets, and each of the N operator sets includes at least 1 operator. Sub, N is an integer greater than or equal to 2.
303,若N个算子集合为相互独立的算子集合,移动终端启用N个线程分别对N个算子集合中的算子进行计算。303. If the N operator sets are mutually independent operator sets, the mobile terminal activates N threads to calculate operators in the N operator sets respectively.
本申请实施例中的步骤301至步骤303的具体实施可以参见图1所示的步骤101至步骤103的具体描述,此处不再赘述。For the specific implementation of step 301 to step 303 in the embodiment of the present application, reference may be made to the specific description of step 101 to step 103 shown in FIG. 1, which will not be repeated here.
304,若N个算子集合不是相互独立的算子集合,移动终端根据N个算子集合之间的依赖关系,采用顺逆交替迭代调度算法确定N个算子集合中需要并行执行算子和需要串行执行算子。304. If the N operator sets are not mutually independent operator sets, the mobile terminal adopts the forward and reverse alternating iterative scheduling algorithm to determine that the operator sums need to be executed in parallel among the N operator sets according to the dependency between the N operator sets. Need to execute operators serially.
305,移动终端确定需要并行执行算子与需要串行执行算子的执行顺序,调度N个算子集合中需要并行执行算子和需要串行执行算子进行计算。305. The mobile terminal determines the execution order of the operators that need to be executed in parallel and the operators that need to be executed in series, and schedules the operators that need to be executed in parallel and the operators that need to be executed in series among the N operator sets for calculation.
本申请实施例中,顺逆交替迭代调度算法,也称为CAP-FB算法,是一种节点调度算法,本申请实施例提供一种节点调度方案使得算子的并行执行时间较短,可以提高算子并行执行速度,进而提高神经网络计算的速度,从而降低神经网络的推理时间。In the embodiment of the present application, the forward and reverse alternate iterative scheduling algorithm, also known as the CAP-FB algorithm, is a node scheduling algorithm. The embodiment of the present application provides a node scheduling scheme so that the parallel execution time of operators is shorter and can improve The parallel execution speed of operators increases the speed of neural network calculations, thereby reducing the inference time of neural networks.
下面以图2为了说明N个算子集合中需要并行执行算子和需要串行执行算子。图2中,第一算子集合包括第一算子,第二算子集合包括第二算子、第三算子、第四算子,第三算子集合包括第五算子、第六算子、第七算子,第四算子集合包括第八算子。8个算子之间的执行顺序依次为:先执行第一算子,执行完第一算子之后,并行执行第二算子和第五算子;执行完第二算子之后,执行第三算子,执行完第三算子之后,执行第四算子;执行完第五算子之后,执行第六算子,执行完第六算子之后,执行第七算子;执行完第四算子和第七算子之后,最后执行第八算子。其中,需要串行执行的算子为第一算子集合和第四算子集合,需要并行执行的算子为第二算子集合和第三算子集合。Figure 2 is used below to illustrate the need for parallel execution of operators and serial execution of operators in the N operator sets. In Figure 2, the first operator set includes the first operator, the second operator set includes the second operator, the third operator, and the fourth operator, and the third operator set includes the fifth operator and the sixth operator. The seventh operator, the fourth operator set includes the eighth operator. The order of execution among the eight operators is as follows: execute the first operator first, after the first operator is executed, execute the second and fifth operators in parallel; after the second operator is executed, execute the third operator Operator, after executing the third operator, execute the fourth operator; after executing the fifth operator, execute the sixth operator, after executing the sixth operator, execute the seventh operator; execute the fourth operator After the seventh operator and the seventh operator, the eighth operator is finally executed. Among them, the operators that need to be executed in series are the first operator set and the fourth operator set, and the operators that need to be executed in parallel are the second operator set and the third operator set.
需要说明的是,图2是为了便于理解而举例说明的一种简单的有向图。在实际的神经网络计算过程中,算子的数量成千上万甚至更多,算子之间的依赖关系也更为复杂,需要采用顺逆交替迭代调度算法对算子的执行先后顺序进行调度,从而能够达到最优的计算速度。It should be noted that FIG. 2 is a simple directed graph illustrated for ease of understanding. In the actual neural network calculation process, the number of operators is tens of thousands or more, and the dependence between operators is more complicated. It is necessary to use forward and backward alternating iterative scheduling algorithm to schedule the execution sequence of operators. , So as to achieve the best calculation speed.
可选的,移动终端调度所述N个算子集合中所述需要并行执行算子和所述需要串行执行算子进行计算,具体为:Optionally, the mobile terminal schedules the operators that need to be executed in parallel and the operators that need to be executed in series in the N operator sets to perform calculations, specifically:
移动终端确定调度策略,依据调度策略调度N个算子集合中需要并行执行算子和需要串行执行算子进行计算;该调度策略包括能耗优先策略、速度优先策略、均衡策略中的任一种。The mobile terminal determines the scheduling strategy. According to the scheduling strategy, the parallel execution operator and the serial execution operator need to be calculated in the N operator sets; the scheduling strategy includes any one of energy consumption priority strategy, speed priority strategy, and equilibrium strategy Kind.
其中,能耗优先策略是以降低计算能耗为主的策略,尽可能的降低计算能耗;速度优先策略是以提高计算速度为主的策略,在现有计算资源的基础上,最大限度的提高计算速度。均衡策略是兼顾计算能耗和计算速度的策略,在保证计算速度达到一定阈值的前提下,尽可能的降低计算能耗。不同的调度策略适用不同的场景。比如,当移动终端电量低于一定阈值时,可以采用能耗优先策略。当移动终端没有比神经网络计算的优先级更高的计算时,可以采用计算优先策略。当不符合上述两种场景时,可以采用均衡策略。本申请实施例可以针对不同的场景采用不同的调度策略,以满足不同场景的神经网络计算需求。Among them, the energy priority strategy is a strategy that mainly reduces computing energy consumption, and reduces computing energy consumption as much as possible; the speed priority strategy is a strategy that mainly increases computing speed, based on existing computing resources, to maximize Improve calculation speed. The equilibrium strategy is a strategy that takes into account both the calculation energy consumption and the calculation speed. On the premise that the calculation speed reaches a certain threshold, the calculation energy consumption is reduced as much as possible. Different scheduling strategies are suitable for different scenarios. For example, when the power of the mobile terminal is below a certain threshold, the energy consumption priority strategy can be adopted. When the mobile terminal does not have a higher priority calculation than the neural network calculation, the calculation priority strategy can be adopted. When the above two scenarios are not met, an equalization strategy can be adopted. The embodiments of the present application may adopt different scheduling strategies for different scenarios to meet the neural network computing requirements of different scenarios.
可选的,移动终端确定调度策略之前,还可以包括如下步骤:Optionally, before the mobile terminal determines the scheduling strategy, it may further include the following steps:
移动终端获取用于神经网络计算的内存资源和处理电路资源;The mobile terminal obtains memory resources and processing circuit resources used for neural network calculations;
移动终端确定调度策略,具体为:The mobile terminal determines the scheduling strategy, specifically:
移动终端依据用于神经网络计算的内存资源和处理电路资源确定调度策略。The mobile terminal determines the scheduling strategy according to the memory resources and processing circuit resources used for neural network calculations.
本申请实施例中,移动终端可以有专属的计算资源用于处理神经网络计算,也可以直接采用中央处理器来处理神经网络计算。如果直接采用中央处理器来处理神经网络计算,移动终端分配给神经网络计算的内存资源和处理电路资源会比较有限。当分配给神经网络计算的内存资源和处理电路资源较多时,可以采用速度优先策略,当分配给神经网络计算的内存资源和处理电路资源较少时,可以采用能耗优先策略或均衡策略。本申请实施例可以根据分配给神经网络计算的不同的内存资源和处理电路资源的多少来调整调度策略,以满足不同硬件资源条件下的神经网络计算需求。In the embodiments of the present application, the mobile terminal may have dedicated computing resources for processing neural network calculations, or directly use a central processing unit to process neural network calculations. If the central processing unit is directly used to process neural network calculations, the memory resources and processing circuit resources allocated to neural network calculations by mobile terminals will be relatively limited. When the memory resources and processing circuit resources allocated to the neural network calculation are large, the speed priority strategy can be adopted. When the memory resources and processing circuit resources allocated to the neural network calculation are less, the energy consumption priority strategy or the equilibrium strategy can be adopted. The embodiment of the present application can adjust the scheduling strategy according to the different memory resources and processing circuit resources allocated to the neural network calculation to meet the neural network calculation requirements under different hardware resource conditions.
可选的,在执行步骤303之前,还可以执行如下步骤:Optionally, before step 303 is performed, the following steps may be performed:
移动终端估算第一算子的预计执行时间,第一算子为N个算子集合中任一个集合中的算子;The mobile terminal estimates the estimated execution time of the first operator, where the first operator is an operator in any of the N operator sets;
可选的,在执行步骤303之后,还可以执行如下步骤:Optionally, after step 303 is performed, the following steps may be performed:
移动终端获取第一算子的实际执行时间,对第一算子的预计执行时间进行修正。The mobile terminal obtains the actual execution time of the first operator, and corrects the estimated execution time of the first operator.
本申请实施例中,当神经网络模型首次运行,由于每个算子的执行时间是不一样的,甚至同一个算子,算的数据量不同,其执行时间也不同。在没有执行第一算子之前,第一算子的预计执行时间是预先设定的,每对该第一算子执行一次,会得到第一算子的实际执行时间,就会对该第一算子的预计执行时间进行一次修正,以逐渐得到该第一算子的准确的预计执行时间。In the embodiment of the present application, when the neural network model is run for the first time, since the execution time of each operator is different, even the same operator has different calculation data and its execution time is also different. Before the first operator is executed, the estimated execution time of the first operator is preset. Each time the first operator is executed, the actual execution time of the first operator will be obtained, and the first operator will be executed. The estimated execution time of the operator is revised once to gradually obtain the accurate estimated execution time of the first operator.
举例来说,以神经网络模型对图像进行处理为例。在第一帧图像计算之前,假定将所有算子的执行时间设为相同,以用作基础时间,下一帧图像再执行时就会修正(更新)这个算子的实际执行时间,执行的图像帧越多,算子的修正的执行时间越准确,从而能够更加准确的预测算子的执行时间,为后续的算子之间的调度提供准确的数据,从而提高算子调度执行的效率。For example, take the neural network model to process the image as an example. Before the calculation of the first frame of image, it is assumed that the execution time of all operators is set to be the same as the base time. When the next frame of image is executed again, the actual execution time of this operator will be corrected (updated), and the executed image The more frames, the more accurate the execution time of the operator's correction, so that the execution time of the operator can be predicted more accurately, and accurate data can be provided for subsequent scheduling between operators, thereby improving the efficiency of operator scheduling execution.
本申请实施例中,首先对神经网络模型的推理过程中需要执行的算子的依赖关系进行计算,根据依赖关系对待执行的算子进行切割,当切割得到的N个算子集合是相互独立的 算子集合时,启用N个线程分别对N个算子集合中的算子进行计算,可以启用N个线程同时对N个算子集合中的算子分别进行计算,可以提高神经网络计算的速度,从而降低神经网络的推理时间。当N个算子集合不是相互独立的算子集合,根据N个算子集合之间的依赖关系,采用顺逆交替迭代调度算法确定N个算子集合中需要并行执行算子和需要串行执行算子,确定需要并行执行算子与需要串行执行算子的执行顺序,调度N个算子集合中需要并行执行算子和需要串行执行算子进行计算,可以采用顺逆交替迭代调度算法对算子进行调度,使得算子的并行执行时间较短,可以提高算子并行执行速度,进而提高神经网络计算的速度,从而降低神经网络的推理时间。In the embodiment of the present application, the dependency relationship of the operators that need to be executed in the inference process of the neural network model is first calculated, and the operators to be executed are cut according to the dependency relationship. When the set of N operators obtained by the cut are independent of each other When the operator set is enabled, N threads are enabled to perform calculations on the operators in the N operator sets, and N threads can be enabled to calculate the operators in the N operator sets at the same time, which can improve the speed of neural network calculations , Thereby reducing the reasoning time of the neural network. When the N operator sets are not mutually independent operator sets, according to the dependency between the N operator sets, the forward and reverse alternating iterative scheduling algorithm is used to determine the parallel execution of the operators and the serial execution of the N operator sets. Operators, determine the execution order of operators that need to be executed in parallel and operators that need to be executed in series, and schedule N operators in the set of operators that need to be executed in parallel and that need to be executed in series to perform calculations, you can use forward and backward alternate iteration scheduling algorithm The operator is scheduled to make the parallel execution time of the operator shorter, which can increase the parallel execution speed of the operator, thereby increasing the speed of the neural network calculation, thereby reducing the inference time of the neural network.
上述主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是,移动终端为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本发明能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。The foregoing mainly introduces the solution of the embodiment of the present application from the perspective of the execution process on the method side. It can be understood that, in order to implement the above-mentioned functions, the mobile terminal includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that in combination with the units and algorithm steps of the examples described in the embodiments disclosed herein, the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
本申请实施例可以根据上述方法示例对移动终端进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiments of the present application may divide the mobile terminal into functional units according to the foregoing method examples. For example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
请参阅图4,图4是本申请实施例公开的一种神经网络计算装置的结构示意图。如图4所示,该神经网络计算装置应用于神经网络算法框架,该神经网络算法框架包括多个张量Tensor单元,该神经网络计算装置400包括通信单元401和处理单元402,其中:Please refer to FIG. 4, which is a schematic structural diagram of a neural network computing device disclosed in an embodiment of the present application. As shown in FIG. 4, the neural network computing device is applied to a neural network algorithm framework. The neural network algorithm framework includes a plurality of tensor units. The neural network computing device 400 includes a communication unit 401 and a processing unit 402, wherein:
所述通信单元401,用于获取M个待执行算子;The communication unit 401 is configured to obtain M operators to be executed;
所述处理单元402,用于计算所述M个待执行算子之间的依赖关系,N为大于或等于2的整数;以及用于依据所述M个待执行算子之间的依赖关系对所述M个待执行算子进行切割,得到N个算子集合,所述N个算子集合中的每个算子集合至少包括1个算子,N为大于或等于2的整数;以及用于在所述N个算子集合为相互独立的算子集合的情况下,启用N个线程分别对所述N个算子集合中的算子进行计算。The processing unit 402 is configured to calculate the dependency relationship between the M to-be-executed operators, where N is an integer greater than or equal to 2; and to calculate the dependency relationship between the M to-be-executed operators. The M to-be-executed operators are cut to obtain N operator sets, each of the N operator sets includes at least one operator, and N is an integer greater than or equal to 2; and In the case that the N operator sets are mutually independent operator sets, N threads are activated to calculate the operators in the N operator sets respectively.
可选的,所述处理单元402依据所述M个待执行算子之间的依赖关系对所述M个待执行算子进行切割,得到N个算子集合,具体为:依据所述M个待执行算子之间的依赖关系,采用图剖分算法对所述M个待执行算子进行切割,得到N个算子集合。Optionally, the processing unit 402 cuts the M to-be-executed operators according to the dependency relationship between the M to-be-executed operators to obtain N sets of operators, specifically: according to the M For the dependency relationship between the operators to be executed, the graph splitting algorithm is used to cut the M operators to be executed to obtain a set of N operators.
可选的,所述处理单元402计算所述M个待执行算子之间的依赖关系之后,还用于依据所述M个待执行算子之间的依赖关系得到所述M个待执行算子之间的有向图;Optionally, after the processing unit 402 calculates the dependency relationship between the M to-be-executed operators, it is further configured to obtain the M to-be-executed operators according to the dependency relationship between the M to-be-executed operators. Directed graph between sub-children;
所述处理单元402依据所述M个待执行算子之间的依赖关系,采用图剖分算法对所述M个待执行算子进行切割,得到N个算子集合,具体为:依据所述M个待执行算子之间的依赖关系,采用图剖分算法对所述M个待执行算子之间的有向图进行切割,得到N个有向子图;其中,每个有向子图对应一个算子集合。The processing unit 402 uses a graph split algorithm to cut the M to-be-executed operators according to the dependency relationship between the M to-be-executed operators to obtain a set of N operators, specifically: according to the For the dependency relationship between the M operators to be executed, the graph partitioning algorithm is used to cut the directed graph among the M operators to be executed to obtain N directed subgraphs; wherein, each directed subgraph The graph corresponds to a set of operators.
可选的,所述处理单元402,还用于在所述N个算子集合不是相互独立的算子集合的情况下,根据所述N个算子集合之间的依赖关系,采用顺逆交替迭代调度算法确定所述N个算子集合中需要并行执行算子和需要串行执行算子;确定所述需要并行执行算子与所述需要串行执行算子的执行顺序,调度所述N个算子集合中所述需要并行执行算子和所述需要串行执行算子进行计算。Optionally, the processing unit 402 is further configured to, in the case that the N operator sets are not mutually independent operator sets, adopt forward and backward alternation according to the dependency relationship between the N operator sets The iterative scheduling algorithm determines the operators that need to be executed in parallel and the operators that need to be executed serially in the set of N operators; determines the execution order of the operators that need to be executed in parallel and the operators that need to be executed in series, and schedules the N The operators that need to be executed in parallel and the operators that need to be executed in series in the set of operators perform calculations.
可选的,所述处理单元402调度所述N个算子集合中所述需要并行执行算子和所述需 要串行执行算子进行计算,具体为:确定调度策略,依据所述调度策略调度所述N个算子集合中所述需要并行执行算子和所述需要串行执行算子进行计算;所述调度策略包括能耗优先策略、速度优先策略、均衡策略中的任一种。Optionally, the processing unit 402 schedules the operators that require parallel execution and the operators that require serial execution in the N operator sets to perform calculations, specifically: determining a scheduling strategy, and scheduling according to the scheduling strategy The operators that need to be executed in parallel and the operators that need to be executed in series in the set of N operators perform calculation; the scheduling strategy includes any one of an energy consumption priority strategy, a speed priority strategy, and an equalization strategy.
可选的,所述处理单元402,还用于在确定调度策略之前,获取用于神经网络计算的内存资源和处理电路资源;Optionally, the processing unit 402 is further configured to obtain memory resources and processing circuit resources used for neural network calculations before determining the scheduling strategy;
所述处理单元402确定调度策略,具体为:依据所述用于神经网络计算的内存资源和处理电路资源确定调度策略。The processing unit 402 determines the scheduling strategy, specifically: determining the scheduling strategy according to the memory resources and processing circuit resources used for neural network calculations.
可选的,所述处理单元402,还用于在启用N个线程分别对所述N个算子集合中的算子进行计算之前,估算第一算子的预计执行时间,所述第一算子为所述N个算子集合中任一个集合中的算子;Optionally, the processing unit 402 is further configured to estimate the expected execution time of the first operator before the N threads are enabled to calculate the operators in the N operator sets. Is an operator in any one of the N operator sets;
所述处理单元402,还用于在启用N个线程分别对所述N个算子集合中的算子进行计算之后,获取所述第一算子的实际执行时间,对所述第一算子的预计执行时间进行修正。The processing unit 402 is further configured to obtain the actual execution time of the first operator after the N threads are enabled to calculate the operators in the N operator sets, and to compare the first operator The estimated execution time of the
其中,图4的通信单元401可以是通信接口,处理单元402可以是处理器,图4所示的神经网络计算装置还可以包括存储单元403,该存储单元可以是存储器(比如,非易失性存储器)。Wherein, the communication unit 401 in FIG. 4 may be a communication interface, and the processing unit 402 may be a processor. The neural network computing device shown in FIG. 4 may further include a storage unit 403, which may be a memory (for example, a non-volatile memory). Memory).
实施图4所示的神经网络计算装置,可以对神经网络模型的推理过程中需要执行的算子的依赖关系进行计算,根据依赖关系对待执行的算子进行切割,当切割得到的N个算子集合是相互独立的算子集合时,启用N个线程分别对N个算子集合中的算子进行计算,可以启用N个线程同时对N个算子集合中的算子分别进行计算,可以提高神经网络计算的速度,从而降低神经网络的推理时间。Implementing the neural network computing device shown in Figure 4 can calculate the dependencies of the operators that need to be executed during the inference process of the neural network model, and cut the operators to be executed according to the dependencies. When the N operators are cut When the set is a set of mutually independent operators, enable N threads to calculate the operators in the N operator sets respectively, and you can enable N threads to calculate the operators in the N operator sets at the same time, which can improve The calculation speed of the neural network reduces the inference time of the neural network.
请参阅图5,图5是本申请实施例公开的一种移动终端的结构示意图。如图5所示,该移动终端500包括处理器501和存储器502,其中,移动终端500还可以包括总线503,处理器501和存储器502可以通过总线503相互连接,总线503可以是外设部件互连标准(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,简称EISA)总线等。总线503可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。其中,移动终端500还可以包括输入输出设备504,输入输出设备504可以包括显示屏,例如液晶显示屏。存储器502用于存储包含指令的一个或多个程序;处理器501用于调用存储在存储器502中的指令执行上述图2至图3中的部分或全部方法步骤。Please refer to FIG. 5, which is a schematic structural diagram of a mobile terminal disclosed in an embodiment of the present application. As shown in FIG. 5, the mobile terminal 500 includes a processor 501 and a memory 502. The mobile terminal 500 may also include a bus 503. The processor 501 and the memory 502 may be connected to each other through the bus 503. The bus 503 may be a peripheral component. Connect the standard (Peripheral Component Interconnect, referred to as PCI) bus or extended industry standard architecture (Extended Industry Standard Architecture, referred to as EISA) bus, etc. The bus 503 can be divided into an address bus, a data bus, a control bus, and so on. For ease of presentation, only one thick line is used in FIG. 5 to represent, but it does not mean that there is only one bus or one type of bus. The mobile terminal 500 may also include an input and output device 504, and the input and output device 504 may include a display screen, such as a liquid crystal display screen. The memory 502 is used to store one or more programs containing instructions; the processor 501 is used to call the instructions stored in the memory 502 to execute some or all of the method steps in FIGS. 2 to 3.
实施图5所示的移动终端,可以对神经网络模型的推理过程中需要执行的算子的依赖关系进行计算,根据依赖关系对待执行的算子进行切割,当切割得到的N个算子集合是相互独立的算子集合时,启用N个线程分别对N个算子集合中的算子进行计算,可以启用N个线程同时对N个算子集合中的算子分别进行计算,可以提高神经网络计算的速度,从而降低神经网络的推理时间。Implementing the mobile terminal shown in Figure 5 can calculate the dependencies of operators that need to be executed during the inference process of the neural network model, and cut the operators to be executed according to the dependencies. When the set of N operators obtained by cutting is When mutually independent operator sets are enabled, N threads are enabled to calculate the operators in the N operator sets, and N threads can be enabled to calculate the operators in the N operator sets at the same time, which can improve the neural network The speed of calculation reduces the reasoning time of the neural network.
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种基于神经网络算法框架的神经网络计算方法的部分或全部步骤。An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables the computer to execute any neural network-based algorithm framework described in the above method embodiments. Part or all of the steps of the neural network calculation method.
本申请实施例还提供一种计算机程序产品,该计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,该计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种基于神经网络算法框架的神经网络计算方法的部分或全部步骤。The embodiments of the present application also provide a computer program product. The computer program product includes a non-transitory computer-readable storage medium storing a computer program. The computer program is operable to cause a computer to execute any of the methods described in the foregoing method embodiments. Part or all of the steps of a neural network calculation method based on the neural network algorithm framework.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明 所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described sequence of actions. Because according to the present invention, certain steps can be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the involved actions and modules are not necessarily required by the present invention.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other various media that can store program codes.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable memory, and the memory can include: flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disc, etc.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The embodiments of the present application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the present invention. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present invention; Persons of ordinary skill in the art, based on the idea of the present invention, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as limiting the present invention.

Claims (20)

  1. 一种基于神经网络算法框架的神经网络计算方法,其特征在于,包括:A neural network calculation method based on a neural network algorithm framework, which is characterized in that it includes:
    获取M个待执行算子,计算所述M个待执行算子之间的依赖关系,N为大于或等于2的整数;Acquiring M to-be-executed operators, and calculating the dependency relationship between the M to-be-executed operators, where N is an integer greater than or equal to 2;
    依据所述M个待执行算子之间的依赖关系对所述M个待执行算子进行切割,得到N个算子集合,所述N个算子集合中的每个算子集合至少包括1个算子,N为大于或等于2的整数;Cut the M to-be-executed operators according to the dependency relationship between the M to-be-executed operators to obtain N operator sets, each of the N operator sets includes at least 1 Operators, N is an integer greater than or equal to 2;
    若所述N个算子集合为相互独立的算子集合,启用N个线程分别对所述N个算子集合中的算子进行计算。If the N operator sets are mutually independent operator sets, N threads are activated to perform calculations on the operators in the N operator sets respectively.
  2. 根据权利要求1所述的方法,其特征在于,所述依据所述M个待执行算子之间的依赖关系对所述M个待执行算子进行切割,得到N个算子集合,包括:The method according to claim 1, wherein the cutting the M to-be-executed operators according to the dependency relationship between the M to-be-executed operators to obtain a set of N operators comprises:
    依据所述M个待执行算子之间的依赖关系,采用图剖分算法对所述M个待执行算子进行切割,得到N个算子集合。According to the dependency relationship between the M to-be-executed operators, a graph partition algorithm is used to cut the M to-be-executed operators to obtain N sets of operators.
  3. 根据权利要求2所述的方法,其特征在于,所述计算所述M个待执行算子之间的依赖关系之后,所述方法还包括:The method according to claim 2, characterized in that, after said calculating the dependency between the M operators to be executed, the method further comprises:
    依据所述M个待执行算子之间的依赖关系得到所述M个待执行算子之间的有向图;Obtaining a directed graph between the M operators to be executed according to the dependency relationship between the M operators to be executed;
    所述依据所述M个待执行算子之间的依赖关系,采用图剖分算法对所述M个待执行算子进行切割,得到N个算子集合包括:According to the dependency relationship between the M to-be-executed operators, using a graph splitting algorithm to cut the M to-be-executed operators to obtain a set of N operators includes:
    依据所述M个待执行算子之间的依赖关系,采用图剖分算法对所述M个待执行算子之间的有向图进行切割,得到N个有向子图;其中,每个有向子图对应一个算子集合。According to the dependency relationship between the M to-be-executed operators, the graph partitioning algorithm is used to cut the directed graph among the M to-be-executed operators to obtain N directed subgraphs; wherein, each The directed subgraph corresponds to a set of operators.
  4. 根据权利要求1~3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 3, wherein the method further comprises:
    若所述N个算子集合不是相互独立的算子集合,根据所述N个算子集合之间的依赖关系,采用顺逆交替迭代调度算法确定所述N个算子集合中需要并行执行算子和需要串行执行算子;If the N operator sets are not mutually independent operator sets, according to the dependency relationship between the N operator sets, the forward and backward alternating iterative scheduling algorithm is used to determine that the N operator sets need to be executed in parallel. Sub and need to execute the operator serially;
    确定所述需要并行执行算子与所述需要串行执行算子的执行顺序,调度所述N个算子集合中所述需要并行执行算子和所述需要串行执行算子进行计算。The execution order of the operators requiring parallel execution and the operators requiring serial execution is determined, and the operators requiring parallel execution and the operators requiring serial execution in the set of N operators are scheduled for calculation.
  5. 根据权利要求4所述的方法,其特征在于,所述调度所述N个算子集合中所述需要并行执行算子和所述需要串行执行算子进行计算,包括:The method according to claim 4, wherein the scheduling of the operators requiring parallel execution and the operators requiring serial execution in the N operator sets for calculation comprises:
    确定调度策略,依据所述调度策略调度所述N个算子集合中所述需要并行执行算子和所述需要串行执行算子进行计算;Determine a scheduling strategy, and schedule the operators that need to be executed in parallel and the operators that need to be executed in series in the N operator sets to perform calculations according to the scheduling strategy;
    所述调度策略包括能耗优先策略、速度优先策略、均衡策略中的任一种。The scheduling strategy includes any one of an energy consumption priority strategy, a speed priority strategy, and a balance strategy.
  6. 根据权利要求5所述的方法,其特征在于,所述确定调度策略之前,所述方法还包括:The method according to claim 5, characterized in that, before the determining the scheduling strategy, the method further comprises:
    获取用于神经网络计算的内存资源和处理电路资源;Obtain memory resources and processing circuit resources for neural network calculations;
    所述确定调度策略包括:The determined scheduling strategy includes:
    依据所述用于神经网络计算的内存资源和处理电路资源确定调度策略。The scheduling strategy is determined according to the memory resources and processing circuit resources used for neural network calculations.
  7. 根据权利要求1~6任一项所述的方法,其特征在于,所述启用N个线程分别对所 述N个算子集合中的算子进行计算之前,所述方法还包括:The method according to any one of claims 1 to 6, characterized in that, before the enabling N threads to calculate the operators in the N operator sets respectively, the method further comprises:
    估算第一算子的预计执行时间,所述第一算子为所述N个算子集合中任一个集合中的算子;Estimating the estimated execution time of the first operator, the first operator being an operator in any one of the N operator sets;
    所述启用N个线程分别对所述N个算子集合中的算子进行计算之后,所述方法还包括:After the activation of the N threads to calculate the operators in the N operator sets, the method further includes:
    获取所述第一算子的实际执行时间,对所述第一算子的预计执行时间进行修正。The actual execution time of the first operator is acquired, and the estimated execution time of the first operator is revised.
  8. 根据权利要求1所述的方法,其特征在于,所述神经网络算法框架包括控制器单元、运算单元和存储单元,所述控制器单元用于存储指令与处理指令,所述运算单元用于对算子进行计算,所述存储单元用于存储神经元和权值。The method according to claim 1, wherein the neural network algorithm framework comprises a controller unit, an arithmetic unit, and a storage unit, the controller unit is used to store instructions and processing instructions, and the arithmetic unit is used to The operator performs calculations, and the storage unit is used to store neurons and weights.
  9. 根据权利要求1所述的方法,其特征在于,所述待执行算子包括Conv2D算子、FusedBatchNorm算子、Relu算子、DepthwiseConv2dNative算子、MaxPool算子、BiasAdd算子、ConcatV2算子中的任一种。The method according to claim 1, wherein the operator to be executed includes any of Conv2D operator, FusedBatchNorm operator, Relu operator, DepthwiseConv2dNative operator, MaxPool operator, BiasAdd operator, and ConcatV2 operator. One kind.
  10. 一种神经网络计算装置,其特征在于,所述神经网络计算装置包括通信单元和处理单元,其中:A neural network computing device, characterized in that the neural network computing device includes a communication unit and a processing unit, wherein:
    所述通信单元,用于获取M个待执行算子;The communication unit is used to obtain M operators to be executed;
    所述处理单元,用于计算所述M个待执行算子之间的依赖关系,N为大于或等于2的整数;以及用于依据所述M个待执行算子之间的依赖关系对所述M个待执行算子进行切割,得到N个算子集合,所述N个算子集合中的每个算子集合至少包括1个算子,N为大于或等于2的整数;以及用于在所述N个算子集合为相互独立的算子集合的情况下,启用N个线程分别对所述N个算子集合中的算子进行计算。The processing unit is configured to calculate the dependency relationship between the M to-be-executed operators, where N is an integer greater than or equal to 2; The M to-be-executed operators are cut to obtain N operator sets, each of the N operator sets includes at least one operator, and N is an integer greater than or equal to 2; and In a case where the N operator sets are mutually independent operator sets, N threads are activated to perform calculations on the operators in the N operator sets respectively.
  11. 根据权利要求10所述的装置,其特征在于,所述处理单元依据所述M个待执行算子之间的依赖关系对所述M个待执行算子进行切割,得到N个算子集合,具体为:The device according to claim 10, wherein the processing unit cuts the M to-be-executed operators according to the dependency relationship between the M to-be-executed operators to obtain a set of N operators, Specifically:
    依据所述M个待执行算子之间的依赖关系,采用图剖分算法对所述M个待执行算子进行切割,得到N个算子集合。According to the dependency relationship between the M to-be-executed operators, a graph partition algorithm is used to cut the M to-be-executed operators to obtain N sets of operators.
  12. 根据权利要求11所述的装置,其特征在于,所述处理单元计算所述M个待执行算子之间的依赖关系之后,还用于依据所述M个待执行算子之间的依赖关系得到所述M个待执行算子之间的有向图;The device according to claim 11, wherein after the processing unit calculates the dependency relationship between the M to-be-executed operators, it is further configured to calculate the dependency relationship between the M to-be-executed operators Obtaining a directed graph between the M to-be-executed operators;
    所述处理单元依据所述M个待执行算子之间的依赖关系,采用图剖分算法对所述M个待执行算子进行切割,得到N个算子集合,具体为:The processing unit cuts the M to-be-executed operators according to the dependency relationship between the M to-be-executed operators by using a graph splitting algorithm to obtain a set of N operators, specifically:
    依据所述M个待执行算子之间的依赖关系,采用图剖分算法对所述M个待执行算子之间的有向图进行切割,得到N个有向子图;其中,每个有向子图对应一个算子集合。According to the dependency relationship between the M to-be-executed operators, the graph partitioning algorithm is used to cut the directed graph among the M to-be-executed operators to obtain N directed subgraphs; wherein, each The directed subgraph corresponds to a set of operators.
  13. 根据权利要求10~12任一项所述的装置,其特征在于,所述处理单元,还用于在所述N个算子集合不是相互独立的算子集合的情况,根据所述N个算子集合之间的依赖关系,采用顺逆交替迭代调度算法确定所述N个算子集合中需要并行执行算子和需要串行执行算子;确定所述需要并行执行算子与所述需要串行执行算子的执行顺序,调度所述N个算子集合中所述需要并行执行算子和所述需要串行执行算子进行计算。The device according to any one of claims 10 to 12, wherein the processing unit is further configured to: when the N operator sets are not mutually independent operator sets, according to the N operators For the dependency relationship between the subsets, the forward and reverse alternating iterative scheduling algorithm is used to determine the operators that need to be executed in parallel and the operators that need to be executed serially in the N operator sets; it is determined that the operators that need to be executed in parallel and the need for serial The execution order of row execution operators is to schedule the operators that need to be executed in parallel and the operators that need to be executed in series in the N operator sets to perform calculations.
  14. 根据权利要求13所述的装置,其特征在于,所述处理单元调度所述N个算子集合中所述需要并行执行算子和所述需要串行执行算子进行计算,具体为:确定调度策略, 依据所述调度策略调度所述N个算子集合中所述需要并行执行算子和所述需要串行执行算子进行计算;The apparatus according to claim 13, wherein the processing unit schedules the operators that require parallel execution and the operators that require serial execution in the N operator sets to perform calculations, specifically: determining scheduling A strategy, scheduling the operators requiring parallel execution and the operators requiring serial execution in the N operator sets to perform calculations according to the scheduling strategy;
    所述调度策略包括能耗优先策略、速度优先策略、均衡策略中的任一种。The scheduling strategy includes any one of an energy consumption priority strategy, a speed priority strategy, and a balance strategy.
  15. 根据权利要求14所述的装置,其特征在于,所述处理单元确定调度策略之前,还用于获取用于神经网络计算的内存资源和处理电路资源;The device according to claim 14, wherein before the processing unit determines the scheduling strategy, it is further configured to obtain memory resources and processing circuit resources for neural network calculations;
    所述处理单元确定调度策略,具体为:The processing unit determines the scheduling strategy, specifically:
    依据所述用于神经网络计算的内存资源和处理电路资源确定调度策略。The scheduling strategy is determined according to the memory resources and processing circuit resources used for neural network calculations.
  16. 根据权利要求10~15任一项所述的装置,其特征在于,所述处理单元还用于在启用N个线程分别对所述N个算子集合中的算子进行计算之前,估算第一算子的预计执行时间,所述第一算子为所述N个算子集合中任一个集合中的算子;The apparatus according to any one of claims 10 to 15, wherein the processing unit is further configured to estimate the first set of operators before the N threads are enabled to calculate the operators in the N operator sets. An estimated execution time of an operator, where the first operator is an operator in any one of the N operator sets;
    所述处理单元还用于在启用N个线程分别对所述N个算子集合中的算子进行计算之后,获取所述第一算子的实际执行时间,对所述第一算子的预计执行时间进行修正。The processing unit is further configured to obtain the actual execution time of the first operator after enabling N threads to perform calculations on the operators in the N operator sets, and to predict the first operator The execution time is corrected.
  17. 根据权利要求10所述的装置,其特征在于,所述神经网络算法框架包括控制器单元、运算单元和存储单元,所述控制器单元用于存储指令与处理指令,所述运算单元用于对算子进行计算,所述存储单元用于存储神经元和权值。The device according to claim 10, wherein the neural network algorithm framework comprises a controller unit, an arithmetic unit, and a storage unit, the controller unit is used to store instructions and processing instructions, and the arithmetic unit is used to The operator performs calculations, and the storage unit is used to store neurons and weights.
  18. 根据权利要求10所述的装置,其特征在于,所述待执行算子包括Conv2D算子、FusedBatchNorm算子、Relu算子、DepthwiseConv2dNative算子、MaxPool算子、BiasAdd算子、ConcatV2算子中的任一种。The device according to claim 10, wherein the operator to be executed comprises any of Conv2D operator, FusedBatchNorm operator, Relu operator, DepthwiseConv2dNative operator, MaxPool operator, BiasAdd operator, and ConcatV2 operator. One kind.
  19. 一种移动终端,其特征在于,包括处理器以及存储器,所述存储器用于存储一个或多个程序,所述一个或多个程序被配置成由所述处理器执行,所述程序包括用于执行如权利要求1-9任一项所述的方法。A mobile terminal, characterized by comprising a processor and a memory, the memory is used to store one or more programs, the one or more programs are configured to be executed by the processor, and the programs include Perform the method of any one of claims 1-9.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-9任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for electronic data exchange, wherein the computer program causes a computer to execute any one of claims 1-9 method.
PCT/CN2020/074719 2019-02-12 2020-02-11 Neural network calculation method and apparatus, mobile terminal and storage medium WO2020164469A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910111499.6 2019-02-12
CN201910111499.6A CN109902819B (en) 2019-02-12 2019-02-12 Neural network computing method, device, mobile terminal and storage medium

Publications (1)

Publication Number Publication Date
WO2020164469A1 true WO2020164469A1 (en) 2020-08-20

Family

ID=66944748

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/074719 WO2020164469A1 (en) 2019-02-12 2020-02-11 Neural network calculation method and apparatus, mobile terminal and storage medium

Country Status (2)

Country Link
CN (1) CN109902819B (en)
WO (1) WO2020164469A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631781A (en) * 2020-12-29 2021-04-09 上海商汤智能科技有限公司 Operator execution method and device, electronic equipment and storage medium
CN116523052A (en) * 2023-07-05 2023-08-01 成都阿加犀智能科技有限公司 Rapid reasoning method, device and equipment

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902819B (en) * 2019-02-12 2023-04-18 Oppo广东移动通信有限公司 Neural network computing method, device, mobile terminal and storage medium
CN110298437B (en) * 2019-06-28 2021-06-01 Oppo广东移动通信有限公司 Neural network segmentation calculation method and device, storage medium and mobile terminal
CN110378413A (en) * 2019-07-17 2019-10-25 Oppo广东移动通信有限公司 Neural network model processing method, device and electronic equipment
CN110503180B (en) * 2019-08-14 2021-09-14 Oppo广东移动通信有限公司 Model processing method and device and electronic equipment
CN110503199A (en) * 2019-08-14 2019-11-26 北京中科寒武纪科技有限公司 Method for splitting and device, the electronic equipment and storage medium of operation node
CN110674936A (en) * 2019-09-24 2020-01-10 上海寒武纪信息科技有限公司 Neural network processing method and device, computer equipment and storage medium
CN111062467B (en) * 2019-12-18 2023-05-12 开放智能机器(上海)有限公司 Automatic neural network subgraph segmentation method applied to AI heterogeneous compiler
CN111210005B (en) * 2019-12-31 2023-07-18 Oppo广东移动通信有限公司 Equipment operation method and device, storage medium and electronic equipment
CN111611479B (en) * 2020-05-07 2024-02-13 北京达佳互联信息技术有限公司 Data processing method and related device for network resource recommendation
CN111984400B (en) * 2020-07-17 2024-04-02 深圳云天励飞技术有限公司 Memory allocation method and device for neural network
WO2022261928A1 (en) * 2021-06-18 2022-12-22 华为技术有限公司 Operation acceleration method and operation accelerator
CN113657584B (en) * 2021-08-31 2024-04-09 安谋科技(中国)有限公司 Neural network model calculation method, data processing method, electronic device and medium
CN114429211A (en) * 2022-02-07 2022-05-03 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for generating information
CN114924745A (en) * 2022-05-19 2022-08-19 北京百度网讯科技有限公司 Operation method and device of deep learning compiler and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103677751A (en) * 2012-09-06 2014-03-26 阿里巴巴集团控股有限公司 Task parallel processing method and device
WO2016028425A1 (en) * 2014-08-21 2016-02-25 Qualcomm Incorporated Programmatic decoupling of task execution from task finish in parallel programs
US20160335119A1 (en) * 2015-05-12 2016-11-17 minds.ai inc Batch-based neural network system
CN108292241A (en) * 2015-10-28 2018-07-17 谷歌有限责任公司 Processing calculates figure
CN109902819A (en) * 2019-02-12 2019-06-18 Oppo广东移动通信有限公司 Neural computing method, apparatus, mobile terminal and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8214831B2 (en) * 2009-05-05 2012-07-03 International Business Machines Corporation Runtime dependence-aware scheduling using assist thread
US9959498B1 (en) * 2016-10-27 2018-05-01 Google Llc Neural network instruction set architecture
CN107729989B (en) * 2017-07-20 2020-12-29 安徽寒武纪信息科技有限公司 Device and method for executing artificial neural network forward operation
CN107748696B (en) * 2017-09-20 2020-05-01 深圳壹账通智能科技有限公司 Task scheduling method and terminal equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103677751A (en) * 2012-09-06 2014-03-26 阿里巴巴集团控股有限公司 Task parallel processing method and device
WO2016028425A1 (en) * 2014-08-21 2016-02-25 Qualcomm Incorporated Programmatic decoupling of task execution from task finish in parallel programs
US20160335119A1 (en) * 2015-05-12 2016-11-17 minds.ai inc Batch-based neural network system
CN108292241A (en) * 2015-10-28 2018-07-17 谷歌有限责任公司 Processing calculates figure
CN109902819A (en) * 2019-02-12 2019-06-18 Oppo广东移动通信有限公司 Neural computing method, apparatus, mobile terminal and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631781A (en) * 2020-12-29 2021-04-09 上海商汤智能科技有限公司 Operator execution method and device, electronic equipment and storage medium
CN116523052A (en) * 2023-07-05 2023-08-01 成都阿加犀智能科技有限公司 Rapid reasoning method, device and equipment
CN116523052B (en) * 2023-07-05 2023-08-29 成都阿加犀智能科技有限公司 Rapid reasoning method, device and equipment

Also Published As

Publication number Publication date
CN109902819A (en) 2019-06-18
CN109902819B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
WO2020164469A1 (en) Neural network calculation method and apparatus, mobile terminal and storage medium
US11151442B2 (en) Convolutional neural network processing method and device
CN112948079B (en) Task scheduling method, device, equipment and computer storage medium
WO2015066979A1 (en) Machine learning method for mapreduce task resource configuration parameters
CN108304925B (en) Pooling computing device and method
WO2023093375A1 (en) Computing resource acquisition method and apparatus, electronic device, and storage medium
CN106293947B (en) GPU-CPU (graphics processing Unit-Central processing Unit) mixed resource allocation system and method in virtualized cloud environment
CN110633959A (en) Method, device, equipment and medium for creating approval task based on graph structure
CN115150471B (en) Data processing method, apparatus, device, storage medium, and program product
EP3983950A1 (en) Neural network training in a distributed system
CN106778550B (en) Face detection method and device
CN109871270B (en) Scheduling scheme generation method and device
CN111984414B (en) Data processing method, system, equipment and readable storage medium
CN110874635A (en) Deep neural network model compression method and device
CN106250346B (en) A kind of realization method and system of intelligent computer
EP3979505A1 (en) Method and device for determining number of decoder iterations, and storage medium and electronic device
CN114021733A (en) Model training optimization method and device, computer equipment and storage medium
CN109739649B (en) Resource management method, device, equipment and computer readable storage medium
CN114091807A (en) Method, device and system for distributing and scheduling tasks of multiple unmanned aerial vehicles and storage medium
CN114297067A (en) Script testing method and device
US11531578B1 (en) Profiling and debugging for remote neural network execution
EP4024286A1 (en) Computing method and apparatus for convolutional neural network model
CN112149826A (en) Profile graph-based optimization method in deep neural network inference calculation
CN112308217A (en) Convolutional neural network acceleration method and system
CN109308327A (en) Figure calculation method device medium apparatus based on the compatible dot center's model of subgraph model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20756028

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20756028

Country of ref document: EP

Kind code of ref document: A1