CN110378413A

CN110378413A - Neural network model processing method, device and electronic device

Info

Publication number: CN110378413A
Application number: CN201910644306.3A
Authority: CN
Inventors: 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-07-17
Filing date: 2019-07-17
Publication date: 2019-10-25

Abstract

The embodiment of the application discloses a neural network model processing method and device and electronic equipment. The method comprises the following steps: acquiring a neural network model to be configured; mapping the neural network model into a graph structure based on the dependency relationship of operators in the neural network model, wherein one node in the graph structure represents one operator in the neural network model; traversing the graph structure to obtain the execution sequence of the nodes; configuring an execution order of operators characterized by the nodes based on the execution order of the nodes. According to the method, the neural network model is converted into the graph structure, and the execution sequence of the operator represented by each node is determined by traversing the nodes in the graph structure, so that the execution sequences of all operators can be rapidly determined, and the operation efficiency of the whole model is improved.

Description

Neural network model processing method, device and electronic device

技术领域technical field

本申请涉及计算机技术领域，更具体地，涉及一种神经网络模型处理方法、装置以及电子设备。The present application relates to the field of computer technology, and more particularly, to a method, apparatus and electronic device for processing a neural network model.

背景技术Background technique

通常神经网络模型都是在计算机等设备进行训练的。为了便于训练好的神经网络模型可以在手机或者平板电脑等电子设备上运行，可以对训练好的神经网络模型进行优化，然而在优化过程中可能会造成神经网络模型的算子执行顺序错乱。Usually neural network models are trained on computers and other equipment. In order to facilitate the trained neural network model to run on electronic devices such as mobile phones or tablet computers, the trained neural network model can be optimized. However, during the optimization process, the execution sequence of the operators of the neural network model may be disordered.

发明内容SUMMARY OF THE INVENTION

鉴于上述问题，本申请提出了一种神经网络模型处理方法、装置以及电子设备，以改善上述问题。In view of the above problems, the present application proposes a neural network model processing method, apparatus, and electronic device to improve the above problems.

第一方面，本申请提供了一种神经网络模型处理方法，应用于电子设备，所述方法包括：获取待配置的神经网络模型；基于所述神经网络模型中算子的依赖关系，将所述神经网络模型映射为图结构，其中，所述图结构中的一个节点表征所述神经网络模型中的一个算子；对所述图结构进行遍历，得到所述节点的执行顺序；基于所述节点的执行顺序配置所述节点所表征的算子的执行顺序。In a first aspect, the present application provides a method for processing a neural network model, which is applied to an electronic device. The method includes: acquiring a neural network model to be configured; The neural network model is mapped to a graph structure, wherein a node in the graph structure represents an operator in the neural network model; the graph structure is traversed to obtain the execution order of the nodes; based on the node The execution order of the node configures the execution order of the operators represented by the node.

第二方面，本申请提供了一种神经网络模型处理装置，运行于电子设备，所述方法包括：模型获取单元，用于获取待配置的神经网络模型；模型处理单元，用于基于所述神经网络模型中算子的依赖关系，将所述神经网络模型映射为图结构，其中，所述图结构中的一个节点表征所述神经网络模型中的一个算子；遍历单元，用于对所述图结构进行遍历，得到所述节点的执行顺序；顺序确定单元，用于基于所述节点的执行顺序配置所述节点所表征的算子的执行顺序。In a second aspect, the present application provides a neural network model processing apparatus, which runs on an electronic device, and the method includes: a model obtaining unit, configured to obtain a neural network model to be configured; and a model processing unit, configured based on the neural network model The dependency relationship of operators in the network model, the neural network model is mapped to a graph structure, wherein a node in the graph structure represents an operator in the neural network model; a traversal unit is used for the The graph structure is traversed to obtain the execution order of the nodes; the order determination unit is configured to configure the execution order of the operators represented by the nodes based on the execution order of the nodes.

第四方面，本申请提供了一种电子设备，包括多核处理器、启动控制器以及存储器，所述存储器用于存储待加载的数据；一个或多个程序被存储在所述启动控制器中并被配置为由所述启动控制器执行以实现上述的方法。In a fourth aspect, the present application provides an electronic device, comprising a multi-core processor, a boot controller, and a memory, where the memory is used to store data to be loaded; one or more programs are stored in the boot controller and is configured to be executed by the startup controller to implement the method described above.

第五方面，本申请提供了一种计算机可读存储介质，所述计算机可读存储介质中存储有程序代码，其中，在所述程序代码被启动控制器运行时执行上述的方法。In a fifth aspect, the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, wherein the above-mentioned method is executed when the program code is run by a startup controller.

本申请提供的一种神经网络模型处理方法、装置以及电子设备，在获取待配置的神经网络模型后，基于所述神经网络模型中算子的依赖关系，将所述神经网络模型映射为图结构，其中，所述图结构中的一个节点表征所述神经网络模型中的一个算子。然后，对所述图结构进行遍历，得到所述节点的执行顺序，基于所述节点的执行顺序配置所述节点所表征的算子的执行顺序。从而通过将神经网络模型转换为图结构，再通过遍历图结构中的节点的方式来确定每个节点所表征的算子的执行顺序，进而实现可以快速的确定所有算子的执行顺序，提升整个模型的运算效率。In a neural network model processing method, device and electronic device provided by the present application, after acquiring a neural network model to be configured, the neural network model is mapped to a graph structure based on the dependencies of operators in the neural network model , wherein a node in the graph structure represents an operator in the neural network model. Then, the graph structure is traversed to obtain the execution order of the nodes, and the execution order of the operators represented by the nodes is configured based on the execution order of the nodes. Therefore, by converting the neural network model into a graph structure, and then by traversing the nodes in the graph structure to determine the execution order of the operators represented by each node, it is possible to quickly determine the execution order of all operators and improve the overall performance. The computational efficiency of the model.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained from these drawings without creative effort.

图1示出了本申请实施例提出的一种神经网络模型处理方法的流程图；1 shows a flowchart of a method for processing a neural network model proposed by an embodiment of the present application;

图2示出了本申请一实施例提出的一种神经网络模型处理方法的中一种神经网络模型所对应的图示意图；2 shows a schematic diagram corresponding to a neural network model in a neural network model processing method proposed by an embodiment of the present application;

图3示出了本申请一实施例提出的一种神经网络模型处理方法的中处理资源消耗程度与时间的映射关系示意图；3 shows a schematic diagram of a mapping relationship between processing resource consumption degree and time in a neural network model processing method proposed by an embodiment of the present application;

图4示出了本申请另一实施例提出的一种神经网络模型处理方法的流程图；FIG. 4 shows a flowchart of a method for processing a neural network model proposed by another embodiment of the present application;

图5示出了本申请再一实施例提出的一种神经网络模型处理方法的流程图；FIG. 5 shows a flowchart of a method for processing a neural network model proposed by yet another embodiment of the present application;

图6示出了本申请实施例提出的一种神经网络模型处理装置的结构框图；FIG. 6 shows a structural block diagram of an apparatus for processing a neural network model proposed by an embodiment of the present application;

图7示出了本申请另一实施例提出的一种神经网络模型处理装置的结构框图；FIG. 7 shows a structural block diagram of an apparatus for processing a neural network model proposed by another embodiment of the present application;

图8示出了本申请再一实施例提出的一种神经网络模型处理装置的结构框图；FIG. 8 shows a structural block diagram of a neural network model processing apparatus proposed by still another embodiment of the present application;

图9示出了本申请的用于执行根据本申请实施例的神经网络模型处理方法的电子设备的结构框图；FIG. 9 shows a structural block diagram of an electronic device of the present application for executing the neural network model processing method according to an embodiment of the present application;

图10是本申请实施例的用于保存或者携带实现根据本申请实施例的神经网络模型处理方法的程序代码的存储单元。FIG. 10 is a storage unit for storing or carrying a program code for implementing the neural network model processing method according to the embodiment of the present application according to the embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

神经网络(Neural Networks,NN)是由大量的、简单的处理单元(称为神经元)广泛地互相连接而形成的复杂网络系统。神经网络具有大规模并行、分布式存储和处理、自组织、自适应和自学能力。通常在神经网络模型中包括有大量的算子。其中，可以理解的是，算子可以看做是一个神经网络模型中的部分算法过程，算子可以把函数映成函数，或者把函数映成数。Neural network (Neural Networks, NN) is a complex network system formed by extensive interconnection of a large number of simple processing units (called neurons). Neural networks have massive parallelism, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities. A large number of operators are usually included in the neural network model. Among them, it can be understood that an operator can be regarded as a part of the algorithm process in a neural network model, and an operator can map a function into a function, or map a function into a number.

然而，发明人在研究中发现，当神经网络的算子较多或者复杂度较高时，整体的算子执行顺序无法较为快速的确定。However, the inventor found in the research that when the neural network has more operators or higher complexity, the overall operator execution order cannot be determined relatively quickly.

再者，神经网络的算子顺序大都采用服务器端的神经网络框架训练好的算子顺序，例如，类似通过Tensorflow这些框架训练出来的神经网络模型算子顺序，当需要对训练得到的神经网络模型增加算子时，只能在原有的网络图中特定算子之间增加，导致在做算子优化时需要重新生成一遍神经网络图，造成模型转换速度会变慢很多。其原因在于，神经网络图的每个算子之间是有依赖关系的，也就是某个算子要想正确执行，其所依赖的算子必须执行完，所以算子是有执行顺序的，像Tensorflow这些PC端框架固化的模型一般已经针对算子排好顺序了。比如算子B依赖于算子A，那在文件(存储有神经网络模型的文件)里算子A就是排在B前面的，那如果需要在A和B之间增加算子C，则一般的做法是重新生成网络，按顺序拷贝直到算子A，然后添加算子C，然后拷贝算子B，然后再拷贝后续算子。也就是说C必须在A和B之间添加，这样才能保证算子正确执行。这导致添加算子变成一个很复杂的操作，会让神经网络模型转换时间很长，进而造成模型转换速度会变慢很多。Furthermore, the operator sequence of the neural network mostly adopts the sequence of operators trained by the neural network framework on the server side. For example, similar to the operator sequence of the neural network model trained by the frameworks such as Tensorflow, when it is necessary to increase the neural network model obtained by training. When the operator is used, it can only be added between specific operators in the original network diagram. As a result, the neural network diagram needs to be regenerated when the operator is optimized, resulting in a much slower model conversion speed. The reason is that each operator in the neural network graph is dependent, that is, if an operator is to be executed correctly, the operator it depends on must be executed, so the operator has an execution order. Models solidified by PC-side frameworks like Tensorflow are generally arranged for operators. For example, operator B depends on operator A. In the file (the file storing the neural network model), operator A is ranked in front of B. If operator C needs to be added between A and B, the general The method is to regenerate the network, copy it in order until operator A, then add operator C, then copy operator B, and then copy subsequent operators. That is to say, C must be added between A and B in order to ensure the correct execution of the operator. This causes adding operators to become a very complex operation, which will make the neural network model conversion take a long time, which in turn will cause the model conversion speed to be much slower.

还有，神经网络模型在移动端的部署，一般是通过在PC等计算机上将训练好的模型固化成一个文件，然后移动端的神经网络框架解析该文件，读入内存，按序执行神经网络模型中的算子。但是因为移动端的资源有限，往往只能运行一些小模型，大模型需要经过一系列的优化，比如算子融合，网络剪枝，模型量化，网络切割，等等来优化模型，以达到可以在移动端运行的目的。这些优化往往会导致原有的算子执行顺序改变，因而需要通过一种行之有效的方式来重新将算子排序，从而使得神经网络模型可以正常执行。In addition, the deployment of the neural network model on the mobile terminal is generally by solidifying the trained model into a file on a computer such as a PC, and then the neural network framework on the mobile terminal parses the file, reads it into memory, and executes the neural network model in sequence. the operator. However, due to the limited resources on the mobile terminal, only small models can often be run. Large models need to undergo a series of optimizations, such as operator fusion, network pruning, model quantization, network cutting, etc. the purpose of running the terminal. These optimizations often lead to changes in the execution order of the original operators, so it is necessary to reorder the operators in an effective way, so that the neural network model can be executed normally.

因此，发明人提出了本申请中可以改善上述问题的神经网络模型处理方法、装置以及电子设备。Therefore, the inventor proposes a neural network model processing method, apparatus and electronic device that can improve the above problems in the present application.

下面将结合附图具体描述本申请的各实施例。The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

请参阅图1，本申请实施例提供的一种神经网络模型处理方法，应用于电子设备，所述方法包括：Referring to FIG. 1, a method for processing a neural network model provided by an embodiment of the present application is applied to an electronic device, and the method includes:

步骤S110：获取待配置的神经网络模型。Step S110: Obtain the neural network model to be configured.

作为一种方式，待配置的神经网络模型可以为直接从网络端获取到的神经网络模型。此外，待配置的神经网络模型也可以为对于从网络端获取到的神经网络模型进行优化以后的神经网络模型。其中，对神经网络模型进行优化可以理解为对神经网络模型进行算子融合，网络剪枝，模型量化，网络切割等操作。As a method, the neural network model to be configured may be a neural network model obtained directly from the network end. In addition, the neural network model to be configured may also be a neural network model obtained by optimizing the neural network model obtained from the network end. Among them, the optimization of the neural network model can be understood as operator fusion, network pruning, model quantization, network cutting and other operations on the neural network model.

步骤S120：基于所述神经网络模型中算子的依赖关系，将所述神经网络模型映射为图结构，其中，所述图结构中的一个节点表征所述神经网络模型中的一个算子。Step S120: Map the neural network model into a graph structure based on the operator dependency in the neural network model, wherein a node in the graph structure represents an operator in the neural network model.

图结构是表示物件与物件之间的关系的数学对象。如果给图结构的每条边规定一个方向，那么得到的图结构称为有向图。在有向图中，与一个节点相关联的边有出边和入边之分。相反，边没有方向的图称为无向图。A graph structure is a mathematical object that represents the relationship between objects. If a direction is specified for each edge of a graph structure, the resulting graph structure is called a directed graph. In a directed graph, there are outgoing and incoming edges associated with a node. Conversely, a graph whose edges have no direction is called an undirected graph.

作为一种方式，本实施例中所述图结构为有向无环图(Directed AcyclicGraph)。通常而言一个无环的有向图称做有向无环图，有向无环图其实与数组、排列、区块链一样，也是一种数据结构。但与区块链不同，有向无环图将最长链共识改成最重链共识。传统区块链上，新发布的区块会加入到原先的最长链之上，并且以所有节点都认为最长的链为准，依次无限蔓延。而有向无环图中，每个新加入的单元，不仅仅只加入到长链里的一个区块，而是加入到之前的所有区块。As a way, the graph structure in this embodiment is a Directed AcyclicGraph (Directed AcyclicGraph). Generally speaking, an acyclic directed graph is called a directed acyclic graph. A directed acyclic graph is actually a data structure like an array, an array, and a blockchain. But unlike the blockchain, the directed acyclic graph changes the longest chain consensus to the heaviest chain consensus. On the traditional blockchain, the newly released block will be added to the original longest chain, and the chain that is considered to be the longest by all nodes shall prevail, and the chain will spread indefinitely. In the directed acyclic graph, each newly added unit is not only added to one block in the long chain, but also added to all previous blocks.

可以理解的是，神经网络模型是由许多的算子所组合成的。而一些算子之间由具有相互的依赖关系。而这里的依赖关系可以理解为数据上的依赖。例如，在神经网络模型中有算子A、算子B、算子C、算子D、算子E、算子F、算子G以及算子H。其中，神经网络模型定义了算子C的输入数据为算子B的输出数据，而算子B的输入数据又是算子A的输出数据，算子A仅有数据输出。那么从这里就可以得出，算子D的运算是需要算子B的输出数据的，那么算子D是依赖于算子B的，同理，算子B的运算是需要算子A的输出数据的，那么算子B是依赖于算子A的。It can be understood that the neural network model is composed of many operators. And some operators have mutual dependencies. The dependencies here can be understood as data dependencies. For example, there are operator A, operator B, operator C, operator D, operator E, operator F, operator G and operator H in the neural network model. Among them, the neural network model defines that the input data of operator C is the output data of operator B, and the input data of operator B is the output data of operator A, and operator A has only data output. Then it can be concluded from this that the operation of operator D requires the output data of operator B, then operator D depends on operator B. Similarly, the operation of operator B requires the output of operator A. data, then operator B is dependent on operator A.

再者，算子C的输出数据为算子F的输出数据，算子E的输入数据为算子A的输出数据，而算子F的输入数据除了来源于算子C外，还来源于算子B。算子G的输入数据来源于算子B、算子C以及算子E，算子H的输入数据来源于算子D和算子F。那么基于前述对于依赖关系的解释，算子F同时依赖于算子B以及算子C，算子E依赖于算子A，算子G同时依赖于算子B、算子C以及算子E，算子H同时依赖于算子D以及算子F。Furthermore, the output data of operator C is the output data of operator F, the input data of operator E is the output data of operator A, and the input data of operator F is not only from operator C, but also from operator C. Sub B. The input data of operator G comes from operator B, operator C and operator E, and the input data of operator H comes from operator D and operator F. Then based on the above explanation of the dependency relationship, operator F depends on operator B and operator C at the same time, operator E depends on operator A, and operator G depends on operator B, operator C and operator E at the same time, Operator H depends on both operator D and operator F.

进而，基于上述依赖关系得出了如图2所示的有相无环图。图中有节点0、节点1、节点2、节点3、节点4、节点5、节点6以及节点7。其中，节点0表征算子A，节点1表征算子B，节点2表征算子C，节点3表征算子D，节点4表征算子E，节点5表征算子F，节点6表征算子G，节点7表征算子H。其中，箭头所指向的节点依赖于箭头的起始节点。Furthermore, the phased acyclic graph shown in Fig. 2 is obtained based on the above dependencies. There are node 0, node 1, node 2, node 3, node 4, node 5, node 6 and node 7 in the figure. Among them, node 0 represents operator A, node 1 represents operator B, node 2 represents operator C, node 3 represents operator D, node 4 represents operator E, node 5 represents operator F, and node 6 represents operator G , node 7 represents operator H. Among them, the node pointed by the arrow depends on the starting node of the arrow.

那么在这种方式下，可先的，电子设备所述对所述图结构进行遍历，得到所述节点的执行顺序的步骤包括：对所述有向无环图进行深度优先遍历，得到所述节点的执行顺序。Then in this way, the electronic device may firstly traverse the graph structure to obtain the execution order of the nodes, including: performing depth-first traversal on the directed acyclic graph to obtain the The execution order of the nodes.

步骤S130：对所述图结构进行遍历，得到所述节点的执行顺序。Step S130: Traverse the graph structure to obtain the execution sequence of the nodes.

其中，如图2所示，虽然对于每相邻的两个节点而言，相互之间的执行顺序是较为清楚的，但是如果将不具有相邻关系的节点均考虑在内，那么对于整体的执行顺序就可能出现错误。例如，对于图2中的节点1、节点2以及节点5，比较明确的是先执行节点1在执行节点5以及先执行节点2再执行节点5。但是，若在节点1执行后，就需要执行节点5的情况下，就会造成节点5比节点2先运行，但是节点5所对应的算子在运行过程中对于节点2所对应的算子的输出数据可能是有一定依赖的，进而这样就会造成数据错误。那么作为一种方式，电子设备可以通过对图结构的每个节点进行遍历，进而得到整体上每个节点的执行顺序是如何的。Among them, as shown in Figure 2, although the execution order between each adjacent two nodes is relatively clear, if all nodes that do not have adjacent relationships are considered, then for the overall The order of execution may be wrong. For example, for node 1, node 2, and node 5 in FIG. 2, it is relatively clear that node 1 is executed first, node 5 is executed, and node 2 is executed first, and then node 5 is executed. However, if node 5 needs to be executed after node 1 is executed, it will cause node 5 to run before node 2, but the operator corresponding to node 5 will have no effect on the operator corresponding to node 2 during the running process. The output data may have certain dependencies, which will cause data errors. Then, as a method, the electronic device can traverse each node of the graph structure to obtain the overall execution order of each node.

作为一种方式，所述对所述有向无环图进行深度优先遍历，得到所述节点的执行顺序。As a method, the depth-first traversal of the directed acyclic graph is performed to obtain the execution order of the nodes.

需要说明的是，在图结构的节点遍历方式中，包括深度优先搜索和广度优先搜索。深度优先搜索属于图算法的一种，英文缩写为DFS即Depth First Search，其过程简要来说是对每一个可能的分支路径深入到不能再深入为止，而且每个节点只能遍历一次。It should be noted that the node traversal method of the graph structure includes depth-first search and breadth-first search. Depth-first search belongs to a kind of graph algorithm, the English abbreviation is DFS, which is Depth First Search.

在具体的遍历过程中，因为有两个顶点依赖节点0，节点1和节点3，这里选择其中一个顶点(选择哪一个都不会有影响)，这里选择1号节点，作为继续搜索的节点。找到节点1后需要找到依赖该节点1的顶点，从图中可以看到节点3，节点5，节点6均依赖节点1。这里选择节点3作为下一个搜寻的节点。然后再继续寻找下一个依赖节点3的节点，因为只有节点7依赖节点3，所以选择节点7。因为没有节点依赖节点7，所以将节点7作为第一个目标节点。因为找不到依赖节点7的节点了，所以下一步需要回到上一个节点3。因为依赖节点3的节点只有节点7，并且节点7已经遍历过了，所以无法找到未遍历的依赖节点3的节点，因此将节点3也作为目标节点。而此时的目标节点依次包括节点7以及节点3。In the specific traversal process, because there are two vertices that depend on node 0, node 1 and node 3, one of the vertices is selected here (whichever one is selected will not matter), and node 1 is selected here as the node to continue the search. After finding node 1, you need to find the vertices that depend on node 1. From the figure, you can see that node 3, node 5, and node 6 all depend on node 1. Here node 3 is selected as the next searched node. Then continue to find the next node that depends on node 3. Since only node 7 depends on node 3, node 7 is selected. Since no node depends on node 7, node 7 is taken as the first target node. Because the node that depends on node 7 cannot be found, the next step is to go back to the previous node 3. Because the only node that depends on node 3 is node 7, and node 7 has been traversed, the untraversed node that depends on node 3 cannot be found, so node 3 is also used as the target node. The target node at this time includes node 7 and node 3 in turn.

继续选择节点3的上一个节点即节点1，寻找未遍历过的依赖节点1的节点，这里还剩下节点5和节点6，选择节点5。找到节点5之后，继续寻找依赖节点5的节点，这里只有节点7，并且已经遍历过了，因此将节点5标记为目标节点。此时的目标节点依次包括节点7、节点3以及节点5。Continue to select the previous node of node 3, that is, node 1, and find the untraversed nodes that depend on node 1. There are still nodes 5 and 6, and node 5 is selected. After finding node 5, continue to look for nodes that depend on node 5. There is only node 7 here, and it has been traversed, so node 5 is marked as the target node. The target node at this time includes node 7 , node 3 and node 5 in sequence.

回到节点5的上一个节点1，继续寻找依赖节点1的节点，这里只剩下节点6。因为没有节点依赖节点6，所以将节点6标记为目标节点。此时，目标节点依次包括节点7、节点3、节点5以及节点6。此时再回到节点6的上一个节点即节点6，此时因为没有节点依赖它并且没有被遍历过的，因此将节点1标记为目标节点。此时，目标节点依次包括节点7、节点3、节点5、节点6以及节点1。再回到节点1的上一个节点即节点0，找到依赖该节点0的节点4。因为没有未遍历的顶点依赖节点4，所以将节点4标记为目标节点。此时，目标节点依次包括节点7、节点3、节点5、节点6、节点1以及节点4。再回到节点4的上一个节点即节点0，因为没有节点再依赖节点0，所以将节点0标记为目标节点。此时，目标节点依次包括节点7、节点3、节点5、节点6、节点1、节点4以及节点0。因为已经回到输入节点即节点0，此时需要从另一个输入节点即节点2开始遍历，因为依赖节点2的节点都遍历过了，所以直接将节点2标记为目标节点。Go back to the previous node 1 of node 5, and continue to look for nodes that depend on node 1. Only node 6 is left here. Since no node depends on node 6, node 6 is marked as the target node. At this time, the target node includes node 7 , node 3 , node 5 and node 6 in sequence. At this time, go back to the previous node of node 6, that is, node 6. At this time, because no node depends on it and has not been traversed, node 1 is marked as the target node. At this time, the target node includes node 7 , node 3 , node 5 , node 6 and node 1 in sequence. Go back to the previous node of node 1, that is, node 0, and find node 4 that depends on node 0. Since no untraversed vertices depend on node 4, node 4 is marked as the target node. At this time, the target node includes node 7 , node 3 , node 5 , node 6 , node 1 and node 4 in sequence. Go back to the previous node of node 4, that is, node 0. Since there is no node that depends on node 0, node 0 is marked as the target node. At this time, the target node includes node 7, node 3, node 5, node 6, node 1, node 4, and node 0 in sequence. Since we have returned to the input node, namely node 0, we need to start the traversal from another input node, namely node 2. Because the nodes that depend on node 2 have all been traversed, node 2 is directly marked as the target node.

那么基于上述的目标节点的标记顺序就可以得到每个节点的执行顺序，即执行的先后顺序为节点2、节点0、节点4、节点1、节点6、节点5、节点3以及节点7。其中，为了便于电子设备可以快速的识别所标记的目标节点的执行顺序，可以利用栈结构的存储空间来存储所标记的目标节点。那么在这种方式下，在遍历过程中，电子设备对所述有向无环图进行深度优先遍历，将遍历过程中获取到的目标节点依次存入栈结构的存储空间，所述目标节点为无从属节点的节点或者从属节点均被遍历的节点；将所述存储空间中节点的出栈顺序作为所述节点的执行顺序。基于图2中所示的实例，存入栈结构的存储空间的目标节点的顺序为节点7、节点3、节点5、节点6、节点1、节点4、节点0以及节点2。那么对应的出栈顺序为节点2、节点0、节点4、节点1、节点6、节点5、节点3以及节点7。Then, the execution sequence of each node can be obtained based on the marking sequence of the above target nodes, that is, the execution sequence is node 2, node 0, node 4, node 1, node 6, node 5, node 3 and node 7. Wherein, in order to facilitate the electronic device to quickly identify the execution sequence of the marked target nodes, the storage space of the stack structure may be used to store the marked target nodes. Then, in this way, during the traversal process, the electronic device performs depth-first traversal on the directed acyclic graph, and stores the target nodes obtained during the traversal process into the storage space of the stack structure in sequence, and the target nodes are A node without a subordinate node or a node to which all subordinate nodes are traversed; the popping order of the nodes in the storage space is taken as the execution order of the nodes. Based on the example shown in FIG. 2 , the order of the target nodes stored in the storage space of the stack structure is node 7 , node 3 , node 5 , node 6 , node 1 , node 4 , node 0 , and node 2 . Then the corresponding popping order is node 2, node 0, node 4, node 1, node 6, node 5, node 3 and node 7.

可以理解的是，对于不依赖其他节点的节点电子设备可以识别为输入顶点。那么在电子设备识别到有多个输入顶点的情况下，电子设备可以基于指定的起始输入顶点，对所述有向无环图进行深度优先遍历，得到所述节点的执行顺序。It can be understood that for nodes that do not depend on other nodes, electronic devices can be identified as input vertices. Then, when the electronic device recognizes that there are multiple input vertices, the electronic device can perform depth-first traversal on the directed acyclic graph based on the specified initial input vertex to obtain the execution order of the nodes.

可以理解的是，在深度优先遍历的这种情况下，对于先遍历的输入顶点将会被靠后执行。例如，前述实例中，先从输入顶点0(节点0)开始遍历，然后再从输入顶点2(节点2)开始遍历，那么对于先遍历的节点0却是在后执行的。类似的，对于被多个节点依赖的节点，对于先被遍历的节点也会被在后执行。例如，图2中的节点3、节点5以及节点6都依赖于节点1，那么如果先从节点3开始进行遍历，那么相应的，节点3会在节点5或者节点6之后才会被执行。而对于不同的节点所表征的算子在被执行时所消耗的处理资源是可能会有所不同的。并且，电子设备的总的处理资源是有限的，若当在进行某个节点对应算子的执行过程中需要消耗大量的处理资源，而电子设备总有其他的程序也需要消耗大量的处理资源，那么就会造成电子设备的卡顿或者宕机。It can be understood that in the case of depth-first traversal, the input vertices traversed first will be executed later. For example, in the foregoing example, the traversal is started from the input vertex 0 (node 0) first, and then the traversal is started from the input vertex 2 (node 2), then the first traversed node 0 is executed later. Similarly, for nodes that are depended on by multiple nodes, the nodes that are traversed first will also be executed later. For example, node 3, node 5, and node 6 in Figure 2 all depend on node 1. If the traversal starts from node 3 first, then node 3 will be executed after node 5 or node 6 accordingly. However, the processing resources consumed by the operators represented by different nodes may be different when they are executed. In addition, the total processing resources of the electronic device are limited. If a large amount of processing resources are consumed during the execution of the operator corresponding to a node, and the electronic device always has other programs that also consume a large amount of processing resources, This will cause the electronic equipment to freeze or crash.

那么作为一种改善上述问题的方式，电子设备可以在进行节点遍历的过程中，对于所有的分支情况均进行一次遍历，进而得到多种所有节点的执行顺序。例如，对于有节点0、节点1、节点2、节点3、节点4、节点5、节点6以及节点7的情况下，可以得到一种执行顺序为节点2、节点0、节点4、节点1、节点6、节点5、节点3以及节点7，还可以得到另一种执行顺序为节点2、节点0、节点4、节点1、节点6、节点3、节点5以及节点7，还可以得到再一种执行顺序为节点2、节点0、节点4、节点1、节点3、节点5、节点6以及节点7。而对于其中的不同的节点所对应的算子的执行过程中，所消耗的处理资源不同的情况下，电子设备在确定了所有的节点的执行顺序后，就已经可以获取到神经网络模型在执行过程中每个阶段所需要消耗的处理资源的相对多少。其中一个阶段表征一个算子的执行过程。Then, as a way to improve the above problem, the electronic device may perform a traversal of all branch situations once during the node traversal process, thereby obtaining various execution orders of all nodes. For example, in the case of node 0, node 1, node 2, node 3, node 4, node 5, node 6 and node 7, an execution order can be obtained as node 2, node 0, node 4, node 1, Node 6, node 5, node 3, and node 7, another execution order can be obtained: node 2, node 0, node 4, node 1, node 6, node 3, node 5, and node 7, and another execution sequence can be obtained. The execution order is node 2, node 0, node 4, node 1, node 3, node 5, node 6, and node 7. In the case of different processing resources consumed during the execution of the operators corresponding to different nodes, the electronic device can already obtain the neural network model after determining the execution order of all nodes. The relative amount of processing resources that each stage of the process needs to consume. One of the phases represents the execution of an operator.

那么对应的，电子设备还可以预估在神经网络模型的整个执行过程中，其他程序对于处理资源的消耗状况，然后将其他程序对于处理资源的消耗状态与前述的多种执行顺序多对应的资源消耗状况进行匹配，将多个执行顺序中与其他程序对于处理资源的消耗状况冲突最小的一种作为最终的执行顺序。Correspondingly, the electronic device can also estimate the consumption status of processing resources by other programs during the entire execution process of the neural network model, and then compare the consumption status of processing resources by other programs with the resources corresponding to the aforementioned various execution sequences. The consumption status is matched, and the one with the least conflict with the consumption status of processing resources of other programs among the multiple execution sequences is used as the final execution sequence.

其中，作为一种方式，电子设备可以将多种执行顺序的处理资源消耗状态与处理时间进行关联进而建立处理资源消耗与时间之间的二维映射关系。如图3所示，其中的横坐标标识时间，纵坐标标识的资源消耗程度，而其中每个柱形表征一个节点的执行过程。例如，t1到t2表征一个节点所对应的算子的执行过程。可以理解的是，每个节点所表征的算子的执行所消耗的时间可能是不同的，图中只是示例性的显示。并且，为了便于计算，对于图示中每个节点所对应的资源消耗程度未该节点所对应的算子运行过程中的平均资源消耗程度。In one way, the electronic device may associate processing resource consumption states of various execution sequences with processing time to establish a two-dimensional mapping relationship between processing resource consumption and time. As shown in Figure 3, the abscissa indicates the time, the ordinate indicates the degree of resource consumption, and each column represents the execution process of a node. For example, t1 to t2 represent the execution process of the operator corresponding to a node. It can be understood that the execution time of the operator represented by each node may be different, and the figure is only an exemplary display. Moreover, for the convenience of calculation, the resource consumption degree corresponding to each node in the figure is not the average resource consumption degree during the operation of the operator corresponding to the node.

那么基于图3所示的方式，就可以获取到多种执行顺序所对应的资源消耗程度与时间之间的映射关系。然后再获取对应时间段内电子设备的其他程序的资源消耗程序与时间之间的映射关系，然后将他程序的资源消耗程序与时间之间的映射关系与多种执行顺序所对应的资源消耗程度与时间之间的映射关系进行分段匹配即可获取到资源消耗冲突最小的一种执行顺序。Then, based on the method shown in FIG. 3 , the mapping relationship between the resource consumption degree and the time corresponding to various execution orders can be obtained. Then, the mapping relationship between the resource consumption programs and time of other programs of the electronic device in the corresponding time period is obtained, and then the mapping relationship between the resource consumption programs and time of other programs and the resource consumption degree corresponding to various execution sequences are obtained. By segment matching with the mapping relationship between times, an execution order with the least resource consumption conflict can be obtained.

可以理解的是，其中的分段匹配中的每一段表征的是一个节点所表征的算子的执行过程。例如，图3中所示的t1到t2为一段，t2到t3也是为一段。那么在匹配过程中，电子设备可以计算得到电子设备在t1到t2这个时间段内其他程序的处理资源消耗程度，然后将该t1到t2这个时间段内其他程序的处理资源消耗程度与t1到t2这个时间段内的节点所对应的算子执行所需的处理资源进行匹配，进而依次对每个阶段完成匹配，从而得到整体的匹配程度。It can be understood that each segment in the segment matching represents the execution process of an operator represented by a node. For example, t1 to t2 shown in FIG. 3 is a segment, and t2 to t3 are also a segment. Then in the matching process, the electronic device can calculate the processing resource consumption degree of other programs of the electronic device in the time period from t1 to t2, and then compare the processing resource consumption degree of other programs in the time period from t1 to t2 with that from t1 to t2. The processing resources required for the execution of the operators corresponding to the nodes in this time period are matched, and then the matching is completed for each stage in turn, so as to obtain the overall matching degree.

在本实施例中，作为一种方式，可以通过评分的方式来完成匹配。可选的，电子设备可以将资源消耗程度的差值与评分之间建立一个映射关系，在对每个阶段的资源消耗程度进行匹配时，可以得到一个评分，例如，差值越小评分越高，那么电子设备就可以得到每个阶段的评分，从而得到每种执行顺序的总评分，将评分最低的那个执行顺序作为选定的执行顺序。具体的，以图3所示的几个阶段为例，节点a对应的执行阶段为t1到t2这个时间段，其对应的处理资源消耗程度a1，那么获取到电子设备在t1到t2这个时间段内其他应用程序的处理资源消耗程度为a2，那么进而将a1减去a2的差值取绝对值后的值所关联的评分作为这个阶段的评分。类似的，可以得到t2到t3、t3到t4、t4到t5、t5到t6、t6到t7、t7到t8着几个时间段的评分，进而就可以得到整体的总评分。In this embodiment, as a method, the matching can be completed by scoring. Optionally, the electronic device can establish a mapping relationship between the difference in the resource consumption degree and the score, and can obtain a score when matching the resource consumption degree of each stage. For example, the smaller the difference, the higher the score. , then the electronic device can obtain the score of each stage, so as to obtain the total score of each execution order, and take the execution order with the lowest score as the selected execution order. Specifically, taking the stages shown in Fig. 3 as an example, the execution stage corresponding to node a is the time period from t1 to t2, and the corresponding processing resource consumption level a1, then it is obtained that the electronic device is in the time period from t1 to t2. The processing resource consumption degree of other applications in the system is a2, then the score associated with the absolute value of the difference between a1 minus a2 is taken as the score at this stage. Similarly, the scores of several time periods from t2 to t3, t3 to t4, t4 to t5, t5 to t6, t6 to t7, and t7 to t8 can be obtained, and then the overall total score can be obtained.

需要说明的是，对于其中每个时间段内电子设备中的其他应用程序的资源消耗程度也可以是平均消耗程度。再者，可以理解的，前述的多种执行顺序的选择过程是在实际的神经网络模型执行之前进行的。那么其中的每个阶段内节点所对应的算子的运行所消耗的时间以及处理资源占用程度为预先计算的预估值。类似的，在每个阶段内的电子设备的其他应用程序的处理资源占用程度也为预估值。It should be noted that the resource consumption degree of other applications in the electronic device in each time period may also be an average consumption degree. Furthermore, it can be understood that the aforementioned selection process of various execution sequences is performed before the actual neural network model is executed. Then, the time consumed by the operation of the operator corresponding to the node in each stage and the occupancy degree of processing resources are pre-calculated estimated values. Similarly, the processing resource occupancy level of other applications of the electronic device in each stage is also an estimated value.

作为一种方式，对于每个阶段内节点所对应的算子的运行所消耗的时间以及处理资源占用程度可以为在进行神经网络模型训练的过程中就预先计算完成。那么在这种方式下，电子设备在从网络接收到存储有神经网络模型的文件的同时，还会接收到标识神经网络模型中每个节点的运算耗时以及资源占用程度的文件。再者，对于每个阶段内电子设备的其他应用程序的处理资源占用程度可以根据电子设备的应用程序历史运行记录来进行预估。可以理解的是，电子设备在对于每天的什么时间段内运行什么应用程序是可以进行记录的，那么对应的在运行过程中处理资源的消耗程度也是可以进行记录的。那么电子设备通过对一段时间内的处理资源的消耗程度的统计就可以获取到一定的规律，进而对后续的每个时间段的处理资源消耗的程度进行预估。As a method, the time consumed by the operation of the operator corresponding to the node in each stage and the degree of processing resource occupancy can be pre-calculated during the training of the neural network model. Then, in this way, the electronic device receives the file storing the neural network model from the network, and also receives the file identifying the computation time and resource occupancy level of each node in the neural network model. Furthermore, the processing resource occupation degree of other applications of the electronic device in each stage can be estimated according to the historical running records of the applications of the electronic device. It can be understood that, the electronic device can record what application program is running in what time period of each day, and then the corresponding consumption degree of processing resources during the running process can also be recorded. Then, the electronic device can obtain a certain rule through statistics of the consumption degree of processing resources in a period of time, and then estimate the degree of consumption of processing resources in each subsequent period of time.

需要说明的是，前述所示的每个阶段为每天中的时间段，例如，12点10分到12点12分这种时间段。再例如，12点1分30秒到12点2分这个时间段，甚至是更短或者更长的时间段。而其中所述的处理资源可以理解为处理器的占用率。It should be noted that each stage shown above is a time period in a day, for example, a time period such as 12:10 to 12:12. For another example, the time period from 12:1:30 to 12:2, or even a shorter or longer time period. The processing resources described therein can be understood as the occupancy rate of the processor.

步骤S140：基于所述节点的执行顺序配置所述节点所表征的算子的执行顺序。Step S140: Configure the execution sequence of the operators represented by the node based on the execution sequence of the node.

本申请提供的一种神经网络模型处理方法，在获取待配置的神经网络模型后，基于所述神经网络模型中算子的依赖关系，将所述神经网络模型映射为图结构，其中，所述图结构中的一个节点表征所述神经网络模型中的一个算子。然后，对所述图结构进行遍历，得到所述节点的执行顺序，基于所述节点的执行顺序配置所述节点所表征的算子的执行顺序。从而通过将神经网络模型转换为图结构，再通过遍历图结构中的节点的方式来确定每个节点所表征的算子的执行顺序，进而实现可以快速的确定所有算子的执行顺序，提升整个模型的运算效率。In a method for processing a neural network model provided by the present application, after acquiring a neural network model to be configured, the neural network model is mapped to a graph structure based on the dependencies of operators in the neural network model, wherein the A node in the graph structure represents an operator in the neural network model. Then, the graph structure is traversed to obtain the execution order of the nodes, and the execution order of the operators represented by the nodes is configured based on the execution order of the nodes. Therefore, by converting the neural network model into a graph structure, and then by traversing the nodes in the graph structure to determine the execution order of the operators represented by each node, it is possible to quickly determine the execution order of all operators and improve the overall performance. The computational efficiency of the model.

请参阅图4，本申请实施例提供的一种神经网络模型处理方法，应用于电子设备，所述方法包括：Referring to FIG. 4 , a method for processing a neural network model provided by an embodiment of the present application is applied to an electronic device, and the method includes:

步骤S210：接收训练得到的神经网络模型。Step S210: Receive the neural network model obtained by training.

步骤S220：对所述训练得到的神经网络模型进行优化，得到待配置的神经网络模型；其中，所述待配置的神经网络模型为适配所述电子设备的数据计算能力的神经网络模型。Step S220: Optimizing the neural network model obtained by the training to obtain a neural network model to be configured; wherein the neural network model to be configured is a neural network model adapted to the data computing capability of the electronic device.

所述对所述训练得到的神经网络模型进行优化的步骤至少包括下列步骤中的至少一个：对所述训练得到的神经网络模型进行算子融合；对所述训练得到的神经网络模型进行网络剪枝；对所述训练得到的神经网络模型进行模型量化；以及对所述训练得到的神经网络模型进行网络切割。The step of optimizing the neural network model obtained by training at least includes at least one of the following steps: performing operator fusion on the neural network model obtained by training; performing network trimming on the neural network model obtained by training. performing model quantization on the neural network model obtained by the training; and performing network cutting on the neural network model obtained by the training.

其中，算子融合可以理解为合并一些算子以减少计算或者减少内存拷贝(比如Conv2D和BatchNorm的合并)。网络剪枝，可以理解为去除一些不必要的算子，以简化网络(比如去掉一些只在训练时会用到的冗余算子)。再者，一般神经网络模型的内部的计算都采用了浮点数计算，浮点数的计算会消耗比较大的计算资源(空间和cpu/gpu时间),如果在不影响神经网络模型准确率的情况下，神经网络模型内部可以采用其他简单数值类型进行计算的话，计算速度会提高很多，消耗的计算资源会大大减小，尤其是对于移动设备来说，这点尤其重要。量化即通过减少表示每个权重所需的比特数来压缩原始神经网络模型。Among them, operator fusion can be understood as merging some operators to reduce calculations or reduce memory copies (such as the merger of Conv2D and BatchNorm). Network pruning can be understood as removing some unnecessary operators to simplify the network (such as removing some redundant operators that are only used during training). Furthermore, the internal calculation of the general neural network model uses floating-point calculation, and the calculation of floating-point number will consume relatively large computing resources (space and cpu/gpu time), if the accuracy of the neural network model is not affected. , if other simple numerical types can be used for calculation inside the neural network model, the calculation speed will be greatly improved, and the consumption of computing resources will be greatly reduced, especially for mobile devices, this is especially important. Quantization is the compression of the original neural network model by reducing the number of bits required to represent each weight.

步骤S230：基于所述神经网络模型中算子的依赖关系，将所述神经网络模型映射为图结构，其中，所述图结构中的一个节点表征所述神经网络模型中的一个算子；Step S230: mapping the neural network model to a graph structure based on the operator dependency in the neural network model, wherein a node in the graph structure represents an operator in the neural network model;

步骤S240：对所述图结构进行遍历，得到所述节点的执行顺序。Step S240: Traverse the graph structure to obtain the execution sequence of the nodes.

步骤S250：基于所述节点的执行顺序配置所述节点所表征的算子的执行顺序。Step S250: Configure the execution order of the operators represented by the node based on the execution order of the node.

本申请提供的一种神经网络模型处理方法，在获取待配置的神经网络模型后，基于所述神经网络模型中算子的依赖关系，将所述神经网络模型映射为图结构，其中，所述图结构中的一个节点表征所述神经网络模型中的一个算子。然后，对所述图结构进行遍历，得到所述节点的执行顺序，基于所述节点的执行顺序配置所述节点所表征的算子的执行顺序。从而通过将神经网络模型转换为图结构，再通过遍历图结构中的节点的方式来确定每个节点所表征的算子的执行顺序，进而实现可以快速的确定所有算子的执行顺序，提升整个模型的运算效率。并且，本实施例中提供的方法可以是对进行优化后神经网络模型进行计算，进而实现在神经网络模型优化以后可以快速的获取到准确的算子执行顺序。In a method for processing a neural network model provided by the present application, after acquiring a neural network model to be configured, the neural network model is mapped to a graph structure based on the dependencies of operators in the neural network model, wherein the A node in the graph structure represents an operator in the neural network model. Then, the graph structure is traversed to obtain the execution order of the nodes, and the execution order of the operators represented by the nodes is configured based on the execution order of the nodes. Therefore, by converting the neural network model into a graph structure, and then by traversing the nodes in the graph structure to determine the execution order of the operators represented by each node, it is possible to quickly determine the execution order of all operators and improve the overall performance. The computational efficiency of the model. In addition, the method provided in this embodiment may calculate the optimized neural network model, thereby realizing that the accurate operator execution order can be quickly obtained after the neural network model is optimized.

请参阅图5，本申请实施例提供的一种神经网络模型处理方法，应用于电子设备，所述方法包括：Referring to FIG. 5 , a method for processing a neural network model provided by an embodiment of the present application is applied to an electronic device, and the method includes:

步骤S310：接收训练得到的神经网络模型。Step S310: Receive the neural network model obtained by training.

步骤S320：在训练得到的神经网络模型中指定算子的执行顺序后增加目标算子，得到待配置的神经网络模型，所述目标算子用于将所述指定算子执行得到数据的存储格式转换为目标格式。Step S320: adding a target operator after specifying the execution order of the operators in the neural network model obtained by training to obtain the neural network model to be configured, where the target operator is used to execute the specified operator to obtain the storage format of the data Convert to the target format.

需要说明的是，经过存储格式转换后可以使得电子设备可以更快的进行数据计算。例如，在智能手机、平板电脑等电子设备中，通过GPU进行计算的算子一般有两种内存布局：Buffer和Image，前者不需要对模型做处理，但是计算速度不如后者，所以一般会使用Image的方式，但是需要将网络模型中的权值从plain的内存转换成Image的格式，因此需要在网络图中增加算子用于内存转换。It should be noted that, after the storage format conversion, the electronic device can perform data calculation faster. For example, in electronic devices such as smart phones and tablet computers, there are generally two memory layouts for operators that are calculated by GPU: Buffer and Image. The former does not need to process the model, but the calculation speed is not as fast as the latter, so it is generally used. Image method, but the weights in the network model need to be converted from plain memory to Image format, so operators need to be added to the network graph for memory conversion.

步骤S330：基于所述神经网络模型中算子的依赖关系，将所述神经网络模型映射为图结构，其中，所述图结构中的一个节点表征所述神经网络模型中的一个算子；Step S330: Map the neural network model into a graph structure based on the operator dependency in the neural network model, wherein a node in the graph structure represents an operator in the neural network model;

步骤S340：对所述图结构进行遍历，得到所述节点的执行顺序。Step S340: Traverse the graph structure to obtain the execution sequence of the nodes.

步骤S350：基于所述节点的执行顺序配置所述节点所表征的算子的执行顺序。Step S350: Configure the execution order of the operators represented by the node based on the execution order of the node.

本申请提供的一种神经网络模型处理方法，在获取待配置的神经网络模型后，基于所述神经网络模型中算子的依赖关系，将所述神经网络模型映射为图结构，其中，所述图结构中的一个节点表征所述神经网络模型中的一个算子。然后，对所述图结构进行遍历，得到所述节点的执行顺序，基于所述节点的执行顺序配置所述节点所表征的算子的执行顺序。从而通过将神经网络模型转换为图结构，再通过遍历图结构中的节点的方式来确定每个节点所表征的算子的执行顺序，进而实现可以快速的确定所有算子的执行顺序，提升整个模型的运算效率。并且，本实施例中提供的方法可以是对进行插入新的算子后的神经网络模型进行计算，进而实现在神经网络模型在插入新的算子后可以快速的获取到准确的算子执行顺序。In a method for processing a neural network model provided by the present application, after acquiring a neural network model to be configured, the neural network model is mapped to a graph structure based on the dependencies of operators in the neural network model, wherein the A node in the graph structure represents an operator in the neural network model. Then, the graph structure is traversed to obtain the execution order of the nodes, and the execution order of the operators represented by the nodes is configured based on the execution order of the nodes. Therefore, by converting the neural network model into a graph structure, and then by traversing the nodes in the graph structure to determine the execution order of the operators represented by each node, it is possible to quickly determine the execution order of all operators and improve the overall performance. The computational efficiency of the model. In addition, the method provided in this embodiment may be to calculate the neural network model after inserting a new operator, so as to realize that after the neural network model inserts a new operator, the accurate execution order of the operators can be quickly obtained. .

请参阅图6，本申请实施例提供的一种神经网络模型处理装置400，运行于电子设备，所述装置400包括：Referring to FIG. 6 , an apparatus 400 for processing a neural network model provided by an embodiment of the present application operates on an electronic device, and the apparatus 400 includes:

模型获取单元410，用于获取待配置的神经网络模型。The model obtaining unit 410 is configured to obtain the neural network model to be configured.

作为一种方式，如图7所示，模型获取单元410包括：As one way, as shown in FIG. 7 , the model obtaining unit 410 includes:

模型接收子单元411，用于接收训练得到的神经网络模型；The model receiving subunit 411 is used for receiving the neural network model obtained by training;

模型优化子单元412，用于对所述训练得到的神经网络模型进行优化，得到待配置的神经网络模型；其中，所述待配置的神经网络模型为适配所述电子设备的数据计算能力的神经网络模型。The model optimization subunit 412 is used to optimize the neural network model obtained by the training, and obtain the neural network model to be configured; wherein, the neural network model to be configured is adapted to the data computing capability of the electronic device. Neural network model.

作为一种方式，如图8所示，模型获取单元410包括：As one way, as shown in FIG. 8 , the model obtaining unit 410 includes:

模型接收子单元413，用于接收训练得到的神经网络模型。The model receiving subunit 413 is configured to receive the neural network model obtained by training.

算子增加子单元414，在训练得到的神经网络模型中指定算子的执行顺序后增加目标算子，得到待配置的神经网络模型，所述目标算子用于将所述指定算子执行得到数据的存储格式转换为目标格式。The operator adding subunit 414 adds a target operator after the execution order of the specified operator in the neural network model obtained by training, to obtain the neural network model to be configured, and the target operator is used to execute the specified operator to obtain The storage format of the data is converted to the target format.

其中，可选的，所述对所述训练得到的神经网络模型进行优化的步骤至少包括下列步骤中的至少一个：对所述训练得到的神经网络模型进行算子融合；对所述训练得到的神经网络模型进行网络剪枝；对所述训练得到的神经网络模型进行模型量化；以及对所述训练得到的神经网络模型进行网络切割。Wherein, optionally, the step of optimizing the neural network model obtained by training includes at least one of the following steps: performing operator fusion on the neural network model obtained by training; Network pruning is performed on the neural network model; model quantization is performed on the neural network model obtained by the training; and network cutting is performed on the neural network model obtained by the training.

模型处理单元420，用于基于所述神经网络模型中算子的依赖关系，将所述神经网络模型映射为图结构，其中，所述图结构中的一个节点表征所述神经网络模型中的一个算子。A model processing unit 420, configured to map the neural network model to a graph structure based on the dependencies of operators in the neural network model, wherein a node in the graph structure represents one of the neural network models operator.

遍历单元430，用于对所述图结构进行遍历，得到所述节点的执行顺序。The traversing unit 430 is configured to traverse the graph structure to obtain the execution order of the nodes.

顺序确定单元440，用于基于所述节点的执行顺序配置所述节点所表征的算子的执行顺序。The sequence determination unit 440 is configured to configure the execution sequence of the operators represented by the node based on the execution sequence of the node.

作为一种方式，所述图结构为有向无环图。在这种方式下，模型处理单元420，具体用于对所述有向无环图进行深度优先遍历，得到所述节点的执行顺序。遍历单元430，具体用于对所述有向无环图进行深度优先遍历，将遍历过程中获取到的目标节点依次存入栈结构的存储空间，所述目标节点为无从属节点的节点或者从属节点均被遍历的节点；将所述存储空间中节点的出栈顺序作为所述节点的执行顺序。As an approach, the graph structure is a directed acyclic graph. In this manner, the model processing unit 420 is specifically configured to perform depth-first traversal on the directed acyclic graph to obtain the execution order of the nodes. The traversal unit 430 is specifically configured to perform depth-first traversal on the directed acyclic graph, and store the target nodes obtained in the traversal process into the storage space of the stack structure in turn, and the target nodes are nodes without subordinate nodes or subordinate nodes. A node whose nodes are all traversed; the pop-out order of the nodes in the storage space is taken as the execution order of the nodes.

作为一种方式，遍历单元430，具体用于基于指定的起始输入顶点，对所述有向无环图进行深度优先遍历，得到所述节点的执行顺序。In one way, the traversal unit 430 is specifically configured to perform depth-first traversal on the directed acyclic graph based on a specified initial input vertex to obtain the execution order of the nodes.

需要说明的是，本申请中装置实施例与前述方法实施例是相互对应的，装置实施例中具体的原理可以参见前述方法实施例中的内容，此处不再赘述。It should be noted that the apparatus embodiments in the present application correspond to the foregoing method embodiments, and the specific principles in the apparatus embodiments may refer to the content in the foregoing method embodiments, which will not be repeated here.

下面将结合图9对本申请提供的一种电子设备进行说明。An electronic device provided by the present application will be described below with reference to FIG. 9 .

请参阅图9，基于上述的神经网络模型处理方法、装置，本申请实施例还提供的另一种可以执行前述神经网络模型处理方法的电子设备200。电子设备200包括相互耦合的一个或多个(图中仅示出一个)处理器102、存储器104以及网络模块106。其中，该存储器104中存储有可以执行前述实施例中内容的程序，而处理器102可以执行该存储器104中存储的程序。Referring to FIG. 9 , based on the above-mentioned method and apparatus for processing a neural network model, an embodiment of the present application further provides another electronic device 200 that can execute the aforementioned method for processing a neural network model. The electronic device 200 includes one or more (only one shown in the figure) a processor 102, a memory 104, and a network module 106 that are coupled to each other. Wherein, the memory 104 stores a program that can execute the content in the foregoing embodiments, and the processor 102 can execute the program stored in the memory 104 .

其中，处理器102可以包括一个或者多个用于处理数据的核。处理器102利用各种接口和线路连接整个电子设备200内的各个部分，通过运行或执行存储在存储器104内的指令、程序、代码集或指令集，以及调用存储在存储器104内的数据，执行电子设备200的各种功能和处理数据。可选地，处理器102可以采用数字信号处理(Digital SignalProcessing，DSP)、现场可编程门阵列(Field－Programmable Gate Array，FPGA)、可编程逻辑阵列(Programmable Logic Array，PLA)中的至少一种硬件形式来实现。处理器102可集成中央处理器(Central Processing Unit，CPU)、图像处理器(Graphics ProcessingUnit，GPU)和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和应用程序等；GPU用于负责显示内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器102中，单独通过一块通信芯片进行实现。The processor 102 may include one or more cores for processing data. The processor 102 uses various interfaces and lines to connect various parts of the entire electronic device 200, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 104, and calling the data stored in the memory 104. Various functions of the electronic device 200 and processing data. Optionally, the processor 102 may employ at least one of a digital signal processing (Digital Signal Processing, DSP), a Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and a Programmable Logic Array (Programmable Logic Array, PLA). implemented in hardware. The processor 102 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs, etc.; the GPU is used for rendering and drawing of the display content; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 102, and is implemented by a communication chip alone.

存储器104可以包括随机存储器(Random Access Memory，RAM)，也可以包括只读存储器(Read-Only Memory)。存储器104可用于存储指令、程序、代码、代码集或指令集。存储器104可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储终端100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。The memory 104 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory). Memory 104 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing the operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like. The storage data area may also store data created by the terminal 100 during use (such as phone book, audio and video data, chat record data) and the like.

所述网络模块106用于接收以及发送电磁波，实现电磁波与电信号的相互转换，从而与通讯网络或者其他设备进行通讯，例如和音频播放设备进行通讯。所述网络模块106可包括各种现有的用于执行这些功能的电路元件，例如，天线、射频收发器、数字信号处理器、加密/解密芯片、用户身份模块(SIM)卡、存储器等等。所述网络模块106可与各种网络如互联网、企业内部网、无线网络进行通讯或者通过无线网络与其他设备进行通讯。上述的无线网络可包括蜂窝式电话网、无线局域网或者城域网。例如，网络模块106可以与基站进行信息交互。The network module 106 is used for receiving and sending electromagnetic waves, realizing mutual conversion between electromagnetic waves and electrical signals, so as to communicate with a communication network or other devices, for example, communicate with an audio playback device. The network module 106 may include various existing circuit elements for performing these functions, eg, antennas, radio frequency transceivers, digital signal processors, encryption/decryption chips, subscriber identity module (SIM) cards, memory, etc. . The network module 106 can communicate with various networks such as the Internet, an intranet, a wireless network, or communicate with other devices through a wireless network. The aforementioned wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network. For example, the network module 106 may interact with the base station for information.

请参考图10，其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质1100中存储有程序代码，所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。Please refer to FIG. 10 , which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. The computer-readable medium 1100 stores program codes, and the program codes can be invoked by the processor to execute the methods described in the above method embodiments.

计算机可读存储介质1100可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地，计算机可读存储介质1100包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质1100具有执行上述方法中的任何方法步骤的程序代码810的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码1110可以例如以适当形式进行压缩。The computer-readable storage medium 1100 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 1100 includes a non-transitory computer-readable storage medium. Computer readable storage medium 1100 has storage space for program code 810 to perform any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products. Program code 1110 may be compressed, for example, in a suitable form.

最后应说明的是：以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not drive the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. a neural network model processing method, is characterized in that, is applied to electronic equipment, and described method comprises:

Get the neural network model to be configured;

Based on the dependencies of the operators in the neural network model, the neural network model is mapped into a graph structure, wherein a node in the graph structure represents an operator in the neural network model;

Traversing the graph structure to obtain the execution order of the nodes;

The execution order of the operators represented by the node is configured based on the execution order of the node.

2. The method according to claim 1, wherein the graph structure is a directed acyclic graph; the step of traversing the graph structure to obtain the execution order of the nodes comprises:

A depth-first traversal is performed on the directed acyclic graph to obtain the execution order of the nodes.

3. The method according to claim 2, wherein the step of performing depth-first traversal on the directed acyclic graph to obtain the execution order of the nodes comprises:

Depth-first traversal is performed on the directed acyclic graph, and the target nodes obtained in the traversal process are sequentially stored in the storage space of the stack structure, where the target nodes are nodes without subordinate nodes or nodes whose subordinate nodes are all traversed;

The popping order of the nodes in the storage space is taken as the execution order of the nodes.

4 . The method according to claim 1 , wherein the directed acyclic graph has a plurality of input vertices, and the depth-first traversal of the directed acyclic graph is performed to obtain the execution order of the nodes. 5 . The steps include:

Based on the specified initial input vertex, depth-first traversal is performed on the directed acyclic graph to obtain the execution order of the nodes.

5. The method according to claim 1, wherein the step of acquiring the neural network model to be configured comprises:

Receive the trained neural network model;

The neural network model obtained by the training is optimized to obtain a neural network model to be configured; wherein, the neural network model to be configured is a neural network model adapted to the data computing capability of the electronic device.

6. The method according to claim 5, wherein the step of optimizing the neural network model obtained by the training at least comprises at least one of the following steps:

Perform operator fusion on the neural network model obtained by the training;

Perform network pruning on the neural network model obtained by the training;

performing model quantization on the trained neural network model; and

Perform network cutting on the neural network model obtained by the training.

7. The method according to claim 5, wherein the step of optimizing the neural network model obtained by the training to obtain the neural network model to be configured comprises:

A target operator is added after specifying the execution order of the operators in the neural network model obtained by training to obtain the neural network model to be configured. The target operator is used to convert the storage format of the data obtained by executing the specified operator into a target. Format.

8. A neural network model processing device, characterized in that, running on an electronic device, the device comprising:

a model obtaining unit, used to obtain the neural network model to be configured;

A model processing unit, configured to map the neural network model to a graph structure based on the operator dependency in the neural network model, wherein a node in the graph structure represents an operator in the neural network model. son;

a traversal unit, used for traversing the graph structure to obtain the execution order of the nodes;

An order determination unit, configured to configure the execution order of the operators represented by the node based on the execution order of the node.

9. An electronic device, comprising a processor and a memory;

One or more programs are stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-7.

10. A computer-readable storage medium, wherein a program code is stored in the computer-readable storage medium, wherein when the program code is executed by a processor, any one of claims 1-7 is executed method.