CN111738423A - Compiling method, device, storage medium and electronic device for neural network model - Google Patents
Compiling method, device, storage medium and electronic device for neural network model Download PDFInfo
- Publication number
- CN111738423A CN111738423A CN202010601610.2A CN202010601610A CN111738423A CN 111738423 A CN111738423 A CN 111738423A CN 202010601610 A CN202010601610 A CN 202010601610A CN 111738423 A CN111738423 A CN 111738423A
- Authority
- CN
- China
- Prior art keywords
- feature map
- feasible
- parameter
- neural network
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003062 neural network model Methods 0.000 title claims abstract description 126
- 238000000034 method Methods 0.000 title claims abstract description 84
- 238000010586 diagram Methods 0.000 claims abstract description 31
- 230000014509 gene expression Effects 0.000 claims abstract description 13
- 238000012546 transfer Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 abstract description 4
- 238000004364 calculation method Methods 0.000 description 18
- 238000005457 optimization Methods 0.000 description 5
- 238000009825 accumulation Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- BVPWJMCABCPUQY-UHFFFAOYSA-N 4-amino-5-chloro-2-methoxy-N-[1-(phenylmethyl)-4-piperidinyl]benzamide Chemical group COC1=CC(N)=C(Cl)C=C1C(=O)NC1CCN(CC=2C=CC=CC=2)CC1 BVPWJMCABCPUQY-UHFFFAOYSA-N 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本申请涉及深度学习领域,具体而言,涉及一种神经网络模型的编译方法、装置、存储介质及电子设备。The present application relates to the field of deep learning, and in particular, to a method, apparatus, storage medium and electronic device for compiling a neural network model.
背景技术Background technique
深度学习能够使机器模仿视听和思考等人类的活动,解决了很多复杂的模式识别难题,其在自然语言处理、图像识别、语音识别、数据挖掘、个性化推荐等多个技术领域取得了非常大的进展。Deep learning can make machines imitate human activities such as audio-visual and thinking, and solve many complex pattern recognition problems. Progress.
如何构造神经网络模型以及对构造好的神经网络模型进行编译是深度学习的核心环节。目前,在将构造好的神经网络模型编译为可执行文件时,由于神经网络模型中参数量大,导致编译出的可执行文件在运行时运算量大,运行效率低。因此,需要对神经网络模型的编译过程进行优化,才能提高编译出的可执行文件的执行效率,但是,目前的优化方法较为简单,编译出的可执行文件的运行效率仍十分低下。How to construct a neural network model and compile the constructed neural network model is the core part of deep learning. At present, when compiling the constructed neural network model into an executable file, due to the large amount of parameters in the neural network model, the compiled executable file has a large amount of computation and low operating efficiency at runtime. Therefore, the compilation process of the neural network model needs to be optimized in order to improve the execution efficiency of the compiled executable file. However, the current optimization method is relatively simple, and the running efficiency of the compiled executable file is still very low.
发明内容SUMMARY OF THE INVENTION
本申请的目的包括,例如,提供了一种神经网络模型的编译方法、装置、存储介质及电子设备,其能够将神经网络模型编译为可执行文件,并减少该可执行文件在运行时的运算量,提高该可执行文件的运行效率。The purpose of the present application includes, for example, to provide a method, apparatus, storage medium and electronic device for compiling a neural network model, which can compile a neural network model into an executable file and reduce the operation of the executable file at runtime to improve the running efficiency of the executable file.
本申请的实施例可以这样实现:The embodiments of the present application can be implemented as follows:
第一方面,实施例提供一种神经网络模型的编译方法,所述方法包括:获取神经网络模型中每个卷积层的原始特征图参数;根据输入输出参数关系式、存储器容量对所述原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合;从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数;所述目标特征图参数对应的数据搬运效率最高;根据每个卷积层对应的目标特征图参数为所述神经网络模型生成可执行文件。In a first aspect, the embodiments provide a method for compiling a neural network model, the method comprising: acquiring the original feature map parameters of each convolutional layer in the neural network model; The feature map parameters are split to obtain a feasible feature map parameter set of each convolution layer; a corresponding target feature map parameter is determined for each convolution layer from the feasible feature map parameter set of each convolution layer; The data handling efficiency corresponding to the target feature map parameters is the highest; an executable file is generated for the neural network model according to the target feature map parameters corresponding to each convolution layer.
在可选的实施方式中,所述原始特征图参数包括原始输出特征图参数;所述根据输入输出参数关系式、存储器容量对所述原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合的步骤包括:根据输入输出参数关系式、存储器容量对所述原始输出特征图参数进行拆分,得到每个卷积层的可行输出特征图参数集合;根据每个卷积层的可行输出特征图参数集合以及所述输入输出参数关系式,确定出每个卷积层的可行输入特征图参数集合;根据每个卷积层的可行输出特征图参数集合以及所述可行输入特征图参数集合,确定每个卷积层的可行特征图参数集合。In an optional implementation manner, the original feature map parameters include original output feature map parameters; the original feature map parameters are split according to the relationship between input and output parameters and memory capacity to obtain the parameters of each convolutional layer. The step of the feasible feature map parameter set includes: splitting the original output feature map parameters according to the relationship between input and output parameters and memory capacity to obtain a feasible output feature map parameter set for each convolution layer; according to each convolution layer The feasible output feature map parameter set and the input and output parameter relational expression, determine the feasible input feature map parameter set of each convolution layer; according to the feasible output feature map parameter set of each convolution layer and the feasible input feature Graph parameter set, which determines the set of feasible feature map parameters for each convolutional layer.
在可选的实施方式中,所述从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数的步骤包括:根据每个卷积层的可行特征图参数集合中每个可行特征图参数的值,确定出每个所述可行特征图参数对应的数据重复加载量;将每个卷积层的可行特征图参数集合中数据重复加载量最小的可行特征图参数作为每个卷积层对应的目标特征图参数。In an optional embodiment, the step of determining a corresponding target feature map parameter for each convolutional layer from the feasible feature map parameter set of each convolutional layer includes: according to the feasible feature map parameter of each convolutional layer The value of each feasible feature map parameter in the feature map parameter set determines the data repetition loading amount corresponding to each feasible feature map parameter; The feasible feature map parameters are used as the target feature map parameters corresponding to each convolutional layer.
在可选的实施方式中,所述神经网络模型包括多个卷积层,所述从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数的步骤包括:根据目标卷积层的可行特征图参数集合中每个可行特征图参数的值,确定出每个所述可行特征图参数对应的数据重复加载量;根据预设的处理核心个数以及所述数据重复加载量,从目标卷积层的可行特征图参数集合中确定出一个最优特征图参数;所述目标卷积层为所述多个卷积层中的任一层,所述最优特征图参数为所述目标卷积层的可行特征图参数集合中数据搬运效率最高的可行特征图参数;将所述最优特征图参数作为所述目标卷积层对应的目标特征图参数。In an optional embodiment, the neural network model includes a plurality of convolution layers, and a corresponding target feature map parameter is determined for each convolution layer from a set of feasible feature map parameters of each convolution layer. The steps include: according to the value of each feasible feature map parameter in the feasible feature map parameter set of the target convolution layer, determine the data repetition loading amount corresponding to each of the feasible feature map parameters; according to the preset number of processing cores and the repeated loading of the data, an optimal feature map parameter is determined from the feasible feature map parameter set of the target convolutional layer; the target convolutional layer is any one of the multiple convolutional layers, so The optimal feature map parameter is the feasible feature map parameter with the highest data handling efficiency in the feasible feature map parameter set of the target convolution layer; the optimal feature map parameter is used as the target feature map corresponding to the target convolution layer. parameter.
在可选的实施方式中,所述根据每个卷积层对应的目标特征图参数为所述神经网络模型生成可执行文件的步骤包括:根据每个卷积层对应的目标特征图参数生成三维直接内存存取DMA数据搬运指令,并根据所述三维DMA数据搬运指令生成所述神经网络模型的可执行文件。In an optional implementation manner, the step of generating an executable file for the neural network model according to the target feature map parameters corresponding to each convolutional layer includes: generating a three-dimensional Direct memory access to DMA data transfer instructions, and generate an executable file of the neural network model according to the three-dimensional DMA data transfer instructions.
在可选的实施方式中,所述方法还包括:执行所述可执行文件以实现所述神经网络模型的数据处理功能。In an optional implementation manner, the method further includes: executing the executable file to implement the data processing function of the neural network model.
第二方面,实施例提供一种神经网络模型的编译装置,包括:获取模块,用于获取神经网络模型中每个卷积层的原始特征图参数;拆分模块,用于根据输入输出参数关系式、存储器容量对所述原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合;所述拆分模块,还用于从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数;所述目标特征图参数对应的数据搬运效率最高;生成模块,用于根据每个卷积层对应的目标特征图参数为所述神经网络模型生成可执行文件。In a second aspect, the embodiments provide an apparatus for compiling a neural network model, including: an acquisition module for acquiring the original feature map parameters of each convolutional layer in the neural network model; a splitting module for according to the relationship between input and output parameters The original feature map parameters are split according to the formula and memory capacity to obtain a feasible feature map parameter set of each convolution layer; the splitting module is also used to obtain a feasible feature map parameter set of each convolution layer. A corresponding target feature map parameter is determined for each convolutional layer; the data handling efficiency corresponding to the target feature map parameter is the highest; the generation module is used to generate the neural network according to the target feature map parameter corresponding to each convolutional layer. The network model generates an executable.
在可选的实施方式中,所述原始特征图参数包括原始输出特征图参数;所述拆分模块用于根据输入输出参数关系式、存储器容量对所述原始输出特征图参数进行拆分,得到每个卷积层的可行输出特征图参数集合;所述拆分模块还用于根据每个卷积层的可行输出特征图参数集合以及所述输入输出参数关系式,确定出每个卷积层的可行输入特征图参数集合;所述拆分模块还用于根据每个卷积层的可行输出特征图参数集合以及所述可行输入特征图参数集合,确定每个卷积层的可行特征图参数集合。In an optional implementation manner, the original feature map parameters include original output feature map parameters; the splitting module is configured to split the original output feature map parameters according to the input and output parameter relational expressions and memory capacity, to obtain A feasible output feature map parameter set of each convolutional layer; the splitting module is further configured to determine each convolutional layer according to the feasible output feature map parameter set of each convolutional layer and the input and output parameter relationship The set of feasible input feature map parameters; the splitting module is also used to determine the feasible feature map parameters of each convolution layer according to the set of feasible output feature map parameters of each convolution layer and the set of feasible input feature map parameters gather.
在可选的实施方式中,所述拆分模块用于根据每个卷积层的可行特征图参数集合中每个可行特征图参数的值,确定出每个所述可行特征图参数对应的数据重复加载量;所述拆分模块还用于将每个卷积层的可行特征图参数集合中数据重复加载量最小的可行特征图参数作为每个卷积层对应的目标特征图参数。In an optional embodiment, the splitting module is configured to determine the data corresponding to each feasible feature map parameter according to the value of each feasible feature map parameter in the feasible feature map parameter set of each convolutional layer Repeated loading; the splitting module is further configured to use the feasible feature map parameter with the smallest data repeated loading in the feasible feature map parameter set of each convolutional layer as the target feature map parameter corresponding to each convolutional layer.
在可选的实施方式中,所述神经网络模型包括多个卷积层,所述拆分模块用于根据目标卷积层的可行特征图参数集合中每个可行特征图参数的值,确定出每个所述可行特征图参数对应的数据重复加载量;所述拆分模块还用于根据预设的处理核心个数以及所述数据重复加载量,从目标卷积层的可行特征图参数集合中确定出一个最优特征图参数;所述目标卷积层为所述多个卷积层中的任一层,所述最优特征图参数为所述目标卷积层的可行特征图参数集合中数据搬运效率最高的可行特征图参数;所述拆分模块还用于将所述最优特征图参数作为所述目标卷积层对应的目标特征图参数。In an optional embodiment, the neural network model includes a plurality of convolutional layers, and the splitting module is configured to determine the value of each feasible feature map parameter in the feasible feature map parameter set of the target convolutional layer to determine the data repetition loading amount corresponding to each of the feasible feature map parameters; the splitting module is further configured to, according to the preset number of processing cores and the data repetition loading amount, from the feasible feature map parameter set of the target convolution layer An optimal feature map parameter is determined from ; the target convolutional layer is any one of the multiple convolutional layers, and the optimal feature map parameter is the set of feasible feature map parameters of the target convolutional layer The feasible feature map parameter with the highest data handling efficiency in the data processing module; the splitting module is further configured to use the optimal feature map parameter as the target feature map parameter corresponding to the target convolutional layer.
在可选的实施方式中,所述生成模块用于根据每个卷积层对应的目标特征图参数生成三维直接内存存取DMA数据搬运指令,并根据所述三维DMA数据搬运指令生成所述神经网络模型的可执行文件。In an optional embodiment, the generation module is configured to generate a 3D direct memory access DMA data handling instruction according to the target feature map parameter corresponding to each convolutional layer, and generate the neural network according to the 3D DMA data handling instruction The executable for the network model.
在可选的实施方式中,所述装置还包括运行模块,所述运行模块用于执行所述可执行文件以实现所述神经网络模型的数据处理功能。In an optional implementation manner, the apparatus further includes a running module configured to execute the executable file to implement the data processing function of the neural network model.
第三方面,实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如前述实施方式中任一项所述的神经网络模型的编译方法。In a third aspect, an embodiment provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method for compiling a neural network model according to any one of the foregoing embodiments.
第四方面,实施例提供一种电子设备,包括处理器和存储器,所述存储器存储有机器可读指令,所述处理器用于执行所述机器可读指令,以实现如前述实施方式中任一项所述的神经网络模型的编译方法。In a fourth aspect, an embodiment provides an electronic device, including a processor and a memory, where the memory stores machine-readable instructions, and the processor is configured to execute the machine-readable instructions to implement any one of the foregoing embodiments The compiling method of the neural network model described in item.
本申请实施例的有益效果包括,例如:通过本申请提供的神经网络模型的编译方法,可以对神经网络模型中每个卷积层的原始特征图参数进行拆分,为每个卷积层找到一个对应的数据搬运效率最高的目标特征图参数,进而提高神经网络模型整体的数据复用率,减少神经网络模型对应的可执行文件在运行时的运算量,并提高可执行文件的运算效率,即,本申请能够将神经网络模型编译为可执行文件,并减少该可执行文件在运行时的运算量,提高该可执行文件的运行效率。The beneficial effects of the embodiments of the present application include, for example, through the method for compiling a neural network model provided by the present application, the original feature map parameters of each convolutional layer in the neural network model can be split, and the parameters of each convolutional layer can be found for each convolutional layer. A corresponding target feature map parameter with the highest data handling efficiency, thereby improving the overall data reuse rate of the neural network model, reducing the amount of computation of the executable file corresponding to the neural network model at runtime, and improving the computational efficiency of the executable file. That is, the present application can compile the neural network model into an executable file, reduce the amount of computation of the executable file when it is running, and improve the running efficiency of the executable file.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following drawings will briefly introduce the drawings that need to be used in the embodiments. It should be understood that the following drawings only show some embodiments of the present application, and therefore do not It should be regarded as a limitation of the scope, and for those of ordinary skill in the art, other related drawings can also be obtained according to these drawings without any creative effort.
图1为本申请实施例所提供的电子设备的一种结构框图;FIG. 1 is a structural block diagram of an electronic device provided by an embodiment of the application;
图2为本申请实施例所提供的神经网络模型的编译方法的一种流程图;2 is a flowchart of a method for compiling a neural network model provided by an embodiment of the present application;
图3为本申请实施例所提供的神经网络模型的编译方法中S110的一种流程图;3 is a flowchart of S110 in the method for compiling a neural network model provided by an embodiment of the present application;
图4为本申请实施例所提供的神经网络模型的编译方法中S120的一种流程图;4 is a flowchart of S120 in the method for compiling a neural network model provided by an embodiment of the present application;
图5为本申请实施例所提供的神经网络模型的编译方法中S120的另一种流程图;5 is another flowchart of S120 in the method for compiling a neural network model provided by an embodiment of the present application;
图6为本申请实施例所提供的神经网络模型的编译方法的另一种流程图;6 is another flowchart of a method for compiling a neural network model provided by an embodiment of the present application;
图7为本申请实施例所提供的神经网络模型的编译装置的一种功能模块图。FIG. 7 is a functional block diagram of an apparatus for compiling a neural network model provided by an embodiment of the present application.
图标:100-电子设备;110-存储器;120-处理器;130-总线;140-通信接口;200-神经网络模型的编译装置;210-获取模块;220-拆分模块;230-生成模块;240-运行模块。Icon: 100-electronic equipment; 110-memory; 120-processor; 130-bus; 140-communication interface; 200-compiling device for neural network model; 210-acquiring module; 220-splitting module; 230-generating module; 240 - Run the module.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. The components of the embodiments of the present application generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations.
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。Thus, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
需要说明的是,在不冲突的情况下,本申请的实施例中的特征可以相互结合。It should be noted that the features in the embodiments of the present application may be combined with each other under the condition of no conflict.
在实现本申请实施例的技术方案的过程中,本申请发明人发现:In the process of realizing the technical solutions of the embodiments of the present application, the inventors of the present application found that:
目前,通常通过编译器将构造好的神经网络模型编译为可执行文件,而在将构造好的神经网络模型编译为可执行文件时,由于神经网络模型中卷积层参数量较大,若不进行优化就直接进行编译,则会导致编译出的可执行文件在运行时运算量大,运行效率低。因此,编译器往往需要通过特征图拆分的方式(将卷积层中的一个运算拆分成多个子运算,这些多个子运算由处理器分别进行独立计算),并复用权重或特征图(数据复用方式),进而实现卷积运算的优化。At present, the constructed neural network model is usually compiled into an executable file by a compiler. When compiling the constructed neural network model into an executable file, due to the large amount of parameters of the convolutional layer in the neural network model, if the Compile directly after optimization, which will result in a large amount of computation and low operation efficiency of the compiled executable file at runtime. Therefore, the compiler often needs to split the feature map (split one operation in the convolution layer into multiple sub-operations, and these multiple sub-operations are independently calculated by the processor), and reuse the weights or feature maps ( Data multiplexing method), and then realize the optimization of convolution operation.
但是,当前特征图拆分的方式通常仅在硬件层面实现,硬件的拆分及数据复用方式是固定的。对于一个包括多个卷积层的神经网络模型,现有技术只能对这个神经网络模型的多个卷积层只进行权重的复用,或是对这个神经网络模型的多个卷积层只进行特征图的复用。然而,对于不同卷积层的卷积运算,不同的特征图拆分方式及不同的数据复用方式对卷积层的运算量的影响各不相同。当特征图数据量较大而权重数据量较小时,复用特征图并反复加载权重的方法能够减少整体数据加载量,减少卷积层的卷积运算量;当特征图较小而权重数据量较大时,复用权重并反复加载特征图的方法减少整体数据加载量,减少卷积层的卷积运算量。However, the current feature map splitting method is usually only implemented at the hardware level, and the hardware splitting and data multiplexing methods are fixed. For a neural network model including multiple convolutional layers, the prior art can only reuse the weights of the multiple convolutional layers of the neural network model, or only reuse the multiple convolutional layers of the neural network model. Perform feature map reuse. However, for the convolution operations of different convolutional layers, different feature map splitting methods and different data multiplexing methods have different influences on the computational complexity of the convolutional layers. When the amount of feature map data is large and the amount of weight data is small, the method of multiplexing feature maps and loading weights repeatedly can reduce the overall data load and the convolution operation of the convolution layer; when the feature map is small and the weight data is large When the value is larger, the method of reusing weights and loading feature maps repeatedly reduces the overall data loading and reduces the convolution operation of the convolution layer.
在常见的神经网络模型中,特征图会随着卷积层的执行过程而逐渐变小,因此,上诉这两种情况都会存在,如果只对这个神经网络模型的多个卷积层只进行权重的复用,或是对这个神经网络模型的多个卷积层只进行特征图的复用,都无法将神经网络模型中的多个卷积层的整体数据加载量降到最低。也即是说,目前的优化方法较为简单,编译出的可执行文件的运行效率仍十分低下。In a common neural network model, the feature map will gradually become smaller with the execution of the convolutional layer. Therefore, both cases will exist. If only weights are applied to multiple convolutional layers of this neural network model The multiplexing of multiple convolutional layers of this neural network model, or only the multiplexing of feature maps for multiple convolutional layers of this neural network model, cannot minimize the overall data loading of multiple convolutional layers in the neural network model. That is to say, the current optimization method is relatively simple, and the running efficiency of the compiled executable file is still very low.
因此,为了改善上述缺陷,本申请实施例提出一种神经网络模型的编译方法、装置、存储介质及电子设备,其能够将神经网络模型编译为可执行文件,并减少该可执行文件在运行时的运算量,提高该可执行文件的运行效率。需要说明的是,以上现有技术中的方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本申请实施例针对上述问题所提出的解决方案,都应该是发明人在本申请过程中对本申请做出的贡献。Therefore, in order to improve the above-mentioned defects, the embodiments of the present application propose a method, apparatus, storage medium and electronic device for compiling a neural network model, which can compile the neural network model into an executable file and reduce the runtime of the executable file. The amount of computation to improve the running efficiency of the executable file. It should be noted that the defects existing in the above solutions in the prior art are the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the following examples of the present application are aimed at the above problems. The proposed solutions should all be contributions made by the inventor to this application during the process of this application.
请参照图1,为本申请实施例所提供的电子设备100的一种结构框图。该电子设备100可以包括存储器110、处理器120、总线130和通信接口140,该存储器110、处理器120和通信接口140相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条总线130或信号线实现电性连接。处理器120可以处理与神经网络模型的编译有关的信息和/或数据,以执行本申请中描述的一个或多个功能。例如,处理器120可以获取神经网络模型中每个卷积层的原始特征图参数,并根据上述数据进行神经网络模型的编译,进而实现本申请提供的神经网络模型的编译方法。Please refer to FIG. 1 , which is a structural block diagram of an
其中,存储器110可以是但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-OnlyMemory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。Wherein, the
处理器120可以是一种集成电路芯片,具有信号处理能力。该处理器120可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(NetworkProcessor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The
可以理解,图1所示的结构仅为示意,该电子设备100还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。图1中所示的各组件可以采用硬件、软件或其组合实现。例如,上述的电子设备100可以是神经网络加速器、服务器、计算机、手机、平板、云平台等,因此,本申请对于电子设备100的具体类型不作限定。It can be understood that the structure shown in FIG. 1 is only for illustration, and the
下面,为了便于理解,本申请以下实施例将以图1所示的电子设备100为例,结合附图,对本申请实施例提供的神经网络模型的编译方法进行具体阐述。请参照图2,图2示出了本申请实施例提供的神经网络模型的编译方法的一种流程图。该神经网络模型的编译方法可以应用于上述的电子设备100,该神经网络模型的编译方法可以包括以下步骤:Hereinafter, for ease of understanding, the following embodiments of the present application will take the
S100,获取神经网络模型中每个卷积层的原始特征图参数。S100, obtain the original feature map parameters of each convolutional layer in the neural network model.
在一些可能的实施例中,电子设备100可以从其他设备(例如,后台服务器、云服务器等设备)的存储介质处获取到预先建立的神经网络模型,或者也可以从自身的存储介质处获取到预先建立的神经网络模型,因此,本申请对于神经网络模型的获取方式不作限定。In some possible embodiments, the
在获取到神经网络模型后,电子设备100可以根据该神经网络模型的全局输入特征图尺寸及算子计算参数(例如,stride参数、kernel size参数等)推算出该神经网络模型中每个卷积层的原始特征图参数(例如,卷积层的输入特征图参数和输出特征图参数)。After acquiring the neural network model, the
此外,神经网络模型中每个卷积层的原始特征图参数也可以是预先存储在电子设备100的存储介质中的,在“获取神经网络模型中每个卷积层的原始特征图参数”时,仅需要从存储介质中获取到该预先存储的神经网络模型中每个卷积层的原始特征图参数即可,因此本申请对于如何获取神经网络模型中每个卷积层的原始特征图参数的方式不做限定。In addition, the original feature map parameters of each convolutional layer in the neural network model may also be pre-stored in the storage medium of the
S110,根据输入输出参数关系式、存储器容量对原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合。S110: Split the original feature map parameters according to the relationship between the input and output parameters and the memory capacity to obtain a feasible feature map parameter set for each convolutional layer.
在一些可能的实施例中,由于在将神经网络模型编译为可执行文件时,该可执行文件的输入、输出、权重子块都需要加载到SRAM(Static Random-Access Memory,静态随机存取存储器)空间中存放,因此上述的存储器容量可以为电子设备100的SRAM的存储容量。当然,在另一些可能的实施例中,上述的存储器容量也可以是预设容量或者是SRAM存储容量中的部分容量,本申请对此不做限定。In some possible embodiments, when the neural network model is compiled into an executable file, the input, output, and weight sub-blocks of the executable file all need to be loaded into SRAM (Static Random-Access Memory, static random-access memory). ) space, so the above-mentioned memory capacity may be the storage capacity of the SRAM of the
由于根据卷积计算的基本原理,神经网络模型的每个卷积层的输入特征图参数和输出特征图参数之间存在固定的对应关系,因此,上述的输入输出参数关系式可以与该固定的对应关系一致,或者说,该固定的对应关系可以为上述的输入输出参数关系式。进而,在获取到神经网络模型中每个卷积层的原始特征图参数后,电子设备100可以根据输入输出参数关系式、存储器容量对原始特征图参数进行拆分,可以得到每个卷积层的可行特征图参数集合。According to the basic principle of convolution calculation, there is a fixed correspondence between the input feature map parameters and the output feature map parameters of each convolutional layer of the neural network model. Therefore, the above-mentioned input and output parameter relationship can be related to the fixed relationship. The corresponding relationship is the same, or in other words, the fixed corresponding relationship may be the above-mentioned input-output parameter relationship. Furthermore, after obtaining the original feature map parameters of each convolutional layer in the neural network model, the
其中,上述的原始特征图参数可以包括原始输入特征图参数和原始输出特征图参数。在“根据输入输出参数关系式、存储器容量对原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合”时,可以基于原始输入特征图参数对原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合;也可以基于原始输出特征图参数对原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合;还可以同时基于原始输入特征图参数和原始输出特征图参数对原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合,因此,本申请对于拆分的具体方式不作限定。Wherein, the above-mentioned original feature map parameters may include original input feature map parameters and original output feature map parameters. When "splitting the original feature map parameters according to the relationship between input and output parameters and memory capacity to obtain a feasible feature map parameter set for each convolutional layer", the original feature map parameters can be split based on the original input feature map parameters , to obtain the feasible feature map parameter set of each convolution layer; the original feature map parameters can also be split based on the original output feature map parameters to obtain the feasible feature map parameter set of each convolution layer; it can also be based on the original input at the same time. The feature map parameters and the original output feature map parameters are used to split the original feature map parameters to obtain a feasible feature map parameter set for each convolutional layer. Therefore, the present application does not limit the specific manner of splitting.
为了进一步提高本申请提供的方法的运行效率,在一些可能的实施例中,本申请中的原始特征图参数可以包括原始输出特征图参数,对于如何“根据输入输出参数关系式、存储器容量对原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合”,在图2的基础上,请参照图3,S110可以包括:In order to further improve the operation efficiency of the method provided by the present application, in some possible embodiments, the original feature map parameters in the present application may include the original output feature map parameters. The feature map parameters are split to obtain a feasible feature map parameter set for each convolutional layer", on the basis of Figure 2, please refer to Figure 3, S110 may include:
S110A,根据输入输出参数关系式、存储器容量对原始输出特征图参数进行拆分,得到每个卷积层的可行输出特征图参数集合。S110A, split the original output feature map parameters according to the relationship between the input and output parameters and the memory capacity to obtain a set of feasible output feature map parameters for each convolutional layer.
在一些可能的实施例中,原始特征图参数所包括的原始输入特征图参数可以包括:输入特征图的通道维度参数inf_c、高度维度参数inf_h以及宽度维度参数inf_w;原始特征图参数所包括的原始输出特征图参数可以包括:输出特征图的通道维度参数ouf_c、高度维度参数ouf_h以及宽度维度参数ouf_w,原始特征图参数中还可以包括权重参数:权重通道参数wt_c、权重维度参数wt_n、权重高度参数wt_h以及权重宽度参数wt_w。In some possible embodiments, the original input feature map parameters included in the original feature map parameters may include: channel dimension parameters inf_c, height dimension parameters inf_h, and width dimension parameters inf_w of the input feature map; The output feature map parameters may include: channel dimension parameter ouf_c, height dimension parameter ouf_h, and width dimension parameter ouf_w of the output feature map, and the original feature map parameters may also include weight parameters: weight channel parameter wt_c, weight dimension parameter wt_n, weight height parameter wt_h and the weight width parameter wt_w.
而在卷积计算的基本原理中,输出特征图的通道维度参数ouf_c与权重参数的权重维度参数wt_n存在对应的固定关系,输出特征图的高度维度参数ouf_h和宽度维度参数ouf_w与输入特征图的高度维度参数inf_h以及宽度维度参数inf_w分别存在对应的固定关系。根据这些固定关系可以得到如下公式1:In the basic principle of convolution calculation, the channel dimension parameter ouf_c of the output feature map and the weight dimension parameter wt_n of the weight parameter have a corresponding fixed relationship, and the height dimension parameter ouf_h and width dimension parameter ouf_w of the output feature map are related to the input feature map. The height dimension parameter inf_h and the width dimension parameter inf_w have corresponding fixed relationships respectively. According to these fixed relations, the following formula 1 can be obtained:
其中,pad为卷积运算补零个数,s为卷积步进参数,pad与s均为根据神经网络模型解析出的参数值,也即是说,pad与s可以认为是预设参数值;Among them, pad is the number of zero-padded convolution operations, s is the convolution step parameter, pad and s are parameter values parsed according to the neural network model, that is to say, pad and s can be considered as preset parameter values ;
并且,由于编译生成的可执行文件的输入、输出、权重子块都需要加载到SRAM空间中存放,因此各个子块的最大尺寸都必须小于SRAM最大容量,假设加速器的输入特征图、权重、输出特征图的SRAM容量大小分别为inf_L、wt_L、ouf_L,则上述尺寸需满足如下公式2:In addition, since the input, output, and weight sub-blocks of the compiled executable file need to be loaded into the SRAM space for storage, the maximum size of each sub-block must be smaller than the maximum capacity of the SRAM, assuming the accelerator's input feature map, weight, output The SRAM capacity sizes of the feature map are inf_L, wt_L, and ouf_L, respectively, and the above dimensions must satisfy the following formula 2:
特别地,当输入特征图、权重、输出特征图共享加速器SRAM容量时,假设加速器SRAM容量为L,则上述公式2可以为:In particular, when the input feature map, weight, and output feature map share the accelerator SRAM capacity, assuming that the accelerator SRAM capacity is L, the above formula 2 can be:
inf_c*inf_h*inf_w+ouf_c*ouf_h*ouf_w+wt_n*wt_c*wt_h*wt_w<L;inf_c*inf_h*inf_w+ouf_c*ouf_h*ouf_w+wt_n*wt_c*wt_h*wt_w<L;
而对于卷积计算,卷积输入特征图宽高尺寸必须大于权重宽高尺寸,并且输出特征图子块尺寸必须小于原始尺寸。因此,假设输出特征图原始尺寸为s_ouf_c*s_ouf_h*s_ouf_w(即输出特征图的原始尺寸分别为s_ouf_c、s_ouf_h、s_ouf_w),可以得到如下公式3:For the convolution calculation, the width and height of the convolution input feature map must be larger than the weight width and height, and the output feature map sub-block size must be smaller than the original size. Therefore, assuming that the original size of the output feature map is s_ouf_c*s_ouf_h*s_ouf_w (that is, the original size of the output feature map is s_ouf_c, s_ouf_h, s_ouf_w respectively), the following formula 3 can be obtained:
进而联立上述公式1、公式2以及公式3,则可以得到输入输出参数关系式,该输入输出参数关系式如下:Then, by combining the above formula 1, formula 2 and formula 3, the relationship between the input and output parameters can be obtained, and the relationship between the input and output parameters is as follows:
为了确保计算一个输出点的累加完整性,避免将计算一个输出点的累加过程拆分成几次计算,增加中间数据缓存,上述公式中的未知量inf_c、wt_h、wt_w可以全部使用原始值(即inf_c、wt_h、wt_w为已知量),由此可知,上述公式中仅有ouf_c、ouf_h、ouf_w为未知量,并且其上下限都是固定的。根据上述的输入输出参数关系式、存储器容量L的大小并通过将ouf_c、ouf_h、ouf_w设为未知量,遍历这些未知量的可能值,即可得到输出特征图的各种拆分尺寸组合(即,得到每个卷积层的可行特征图参数集合)。In order to ensure the accumulation integrity of calculating an output point, avoid splitting the accumulation process of calculating an output point into several calculations, and increase the intermediate data cache, the unknown variables inf_c, wt_h, and wt_w in the above formula can all use the original values (ie inf_c, wt_h, wt_w are known quantities), it can be seen that in the above formula, only ouf_c, ouf_h, ouf_w are unknown quantities, and their upper and lower limits are fixed. According to the above input and output parameter relationship, the size of the memory capacity L, and by setting ouf_c, ouf_h, and ouf_w as unknowns, and traversing the possible values of these unknowns, various split size combinations of the output feature map can be obtained (ie , to get a feasible feature map parameter set for each convolutional layer).
应理解,S110A可以理解为“基于原始输出特征图参数对原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合”的第一步,由于基于原始输出特征图参数对原始特征图参数进行拆分可以避免将计算一个输出点的累加过程拆分成几次计算,增加中间数据缓存(即SS110A可以有效地避免缓存中间结果),进而提高了本申请提供的方法的运行效率。It should be understood that S110A can be understood as the first step of "splitting the original feature map parameters based on the original output feature map parameters to obtain a feasible feature map parameter set for each convolutional layer", since the original output feature map parameters are based on the original output feature map parameters. The splitting of feature map parameters can avoid splitting the accumulation process of calculating one output point into several calculations, and increase the intermediate data cache (that is, SS110A can effectively avoid caching intermediate results), thereby improving the operation efficiency of the method provided by this application. .
S110B,根据每个卷积层的可行输出特征图参数集合以及输入输出参数关系式,确定出每个卷积层的可行输入特征图参数集合。S110B: Determine a feasible input feature map parameter set of each convolutional layer according to the feasible output feature map parameter set of each convolutional layer and the input-output parameter relational expression.
在得到每个卷积层的可行输出特征图参数集合后,根据每个卷积层的可行输出特征图参数集合并结合上述的公式1,可推算出每种拆分尺寸组合下对应的可行输入特征图参数、可行权重参数(即确定出每个卷积层的可行输入特征图参数集合)。After obtaining the feasible output feature map parameter set of each convolution layer, according to the feasible output feature map parameter set of each convolution layer and combining the above formula 1, the corresponding feasible input under each split size combination can be calculated Feature map parameters, feasible weight parameters (that is, to determine the set of feasible input feature map parameters for each convolutional layer).
S110C,根据每个卷积层的可行输出特征图参数集合以及可行输入特征图参数集合,确定每个卷积层的可行特征图参数集合。S110C: Determine a feasible feature map parameter set of each convolution layer according to the feasible output feature map parameter set and the feasible input feature map parameter set of each convolution layer.
在确定出每个卷积层的可行输出特征图参数集合以及可行输入特征图参数集合后,分别将每个卷积层的可行输出特征图参数集合以及可行输入特征图参数集合进行合并,即可确定出每个卷积层的可行特征图参数集合。After determining the feasible output feature map parameter set and the feasible input feature map parameter set of each convolutional layer, the feasible output feature map parameter set and the feasible input feature map parameter set of each convolutional layer are respectively merged, and then Determine the set of feasible feature map parameters for each convolutional layer.
请再参照图2,S120,从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数;目标特征图参数对应的数据搬运效率最高。Referring to FIG. 2 again, in S120, a corresponding target feature map parameter is determined for each convolutional layer from the feasible feature map parameter set of each convolutional layer; the data handling efficiency corresponding to the target feature map parameter is the highest.
在本实施例中,在执行完S100至S110后,每个卷积层对应一个可行特征图参数集合,每个可行特征图参数集合中可能包括多个可行特征图参数。为了减少由神经网络模新所编译出的可执行文件在运行时的运算量,提高该可执行文件的运行效率,对于某一个可行特征图参数集合,可以计算出其所包括的多个可行特征图参数中数据搬运效率最高的可行特征图参数,作为该可行特征图参数集合对应的目标特征图参数。以此类推,可以对每个可行特征图参数集合进行上述操作,最后实现从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数。In this embodiment, after S100 to S110 are performed, each convolutional layer corresponds to a feasible feature map parameter set, and each feasible feature map parameter set may include multiple feasible feature map parameters. In order to reduce the amount of operation of the executable file compiled by the neural network model and improve the running efficiency of the executable file, for a certain feasible feature map parameter set, it can calculate the multiple feasible features included in it. The feasible feature map parameter with the highest data handling efficiency among the map parameters is used as the target feature map parameter corresponding to the feasible feature map parameter set. By analogy, the above operations can be performed on each feasible feature map parameter set, and finally a corresponding target feature map parameter can be determined for each convolutional layer from the feasible feature map parameter set of each convolutional layer.
应理解,由于本申请可以为每个卷积层找到一个对应的数据搬运效率最高的目标特征图参数,进而对于一个包括多个卷积层的神经网络模型,本申请可以实现对该神经网络模型的每个卷积层找到一个数据搬运效率(即数据复用率)最高的权重复用或特征图复用方式。也即是说,本申请能够为神经网络模型的每个卷积层找到一个最好的数据复用方式以提高神经网络模型整体的数据复用率,减少神经网络模型对应的可执行文件在运行时的运算量,并提高可执行文件的运算效率(即不同尺寸的卷积计算都可以找到最佳的拆分组合及数据复用方式,提高网络整体数据复用率,提高推理效率)。即,通过寻找任意尺寸卷积计算的最佳拆分尺寸,实现同一个网络中不同类型卷积计算的灵活拆分,从而提高网络整体数据复用率,间接提高网络推理效率。It should be understood that, since the present application can find a corresponding target feature map parameter with the highest data handling efficiency for each convolutional layer, and then for a neural network model including a plurality of convolutional layers, the present application can realize this neural network model. Find a weight reuse or feature map reuse method with the highest data handling efficiency (ie, data reuse rate) for each convolutional layer. That is to say, the present application can find a best data multiplexing method for each convolutional layer of the neural network model to improve the overall data reuse rate of the neural network model and reduce the running of executable files corresponding to the neural network model. It can increase the computational efficiency of executable files (that is, convolution calculations of different sizes can find the best splitting combination and data multiplexing method, improve the overall data reuse rate of the network, and improve the inference efficiency). That is, by finding the optimal split size for convolution calculations of any size, flexible splitting of different types of convolution calculations in the same network can be achieved, thereby improving the overall data reuse rate of the network and indirectly improving the efficiency of network reasoning.
因此,本申请提供的方法能够减少由神经网络模新所编译出的可执行文件在运行时的运算量,提高该可执行文件的运行效率。此外,本申请提供的方法实现灵活,支持的卷积计算尺寸不受硬件支持尺寸制约,可支持大尺寸的卷积运算。Therefore, the method provided by the present application can reduce the amount of operation of the executable file compiled by the neural network model and improve the running efficiency of the executable file. In addition, the method provided by the present application is flexible in implementation, the supported convolution calculation size is not restricted by the hardware support size, and can support large-sized convolution operations.
在一些可能的实施例中,对于如何“从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数”,在图2的基础上,请参照图4,S120可以包括:In some possible embodiments, for how to "determine a corresponding target feature map parameter for each convolutional layer from the feasible feature map parameter set of each convolutional layer", based on FIG. Figure 4, S120 may include:
S120A,根据每个卷积层的可行特征图参数集合中每个可行特征图参数的值,确定出每个可行特征图参数对应的数据重复加载量。S120A, according to the value of each feasible feature map parameter in the feasible feature map parameter set of each convolutional layer, determine the data repetition loading amount corresponding to each feasible feature map parameter.
在一些可能的实施例中,在得到可行输出特征图参数集合、可行输入特征图参数集合、可行权重参数的各种拆分尺寸组合后(即得到每个卷积层的可行特征图参数集合后),当不做任何拆分时,输入特征图、权重、输出特征图都仅会经历一次DDR(Double DataRate SDRAM,双倍速率SDRAM)到SRAM(或SRAM到DDR)的数据传输过程,如果数据整体部分加载次数大于一次则纳入统计。当进行拆分时,根据拆分尺寸组合的不同,存在不同的数据重复加载量的计算方式,如下:In some possible embodiments, after obtaining a feasible output feature map parameter set, a feasible input feature map parameter set, and various split size combinations of feasible weight parameters (that is, after obtaining a feasible feature map parameter set for each convolution layer) ), when no splitting is done, the input feature map, weight, and output feature map will only undergo a data transfer process from DDR (Double DataRate SDRAM, double-rate SDRAM) to SRAM (or SRAM to DDR), if the data If the number of loading of the whole part is more than one time, it will be included in the statistics. When splitting, depending on the combination of split sizes, there are different ways of calculating the amount of repeated data loading, as follows:
计算方式1,当只对输出通道进行拆分时(也即是,对权重通道参数wt_c、权重维度参数wt_n、权重高度参数wt_h以及权重宽度参数wt_w进行拆分,而输入特征图的通道维度参数inf_c、高度维度参数inf_h以及宽度维度参数inf_w、输出特征图的通道维度参数ouf_c、高度维度参数ouf_h以及宽度维度参数ouf_w保持不变),对这个神经网络模型的多个卷积层进行特征图的复用更优。此时,权重虽然需要加载多次,但多次加载的权重相互之间独立,不存在重复。进而在复用特征图的情况下,所有数据都只需要加载一次。因此,只对输出通道进行拆分时,数据重复加载量F1为零。Calculation method 1, when only the output channel is split (that is, the weight channel parameter wt_c, the weight dimension parameter wt_n, the weight height parameter wt_h and the weight width parameter wt_w are split, and the channel dimension parameter of the input feature map is split. inf_c, height dimension parameter inf_h and width dimension parameter inf_w, the channel dimension parameter ouf_c, height dimension parameter ouf_h and width dimension parameter ouf_w of the output feature map remain unchanged), and perform feature map analysis on multiple convolutional layers of this neural network model. Better reuse. At this time, although the weights need to be loaded multiple times, the weights loaded multiple times are independent of each other and there is no duplication. Furthermore, in the case of multiplexing feature maps, all data only needs to be loaded once. Therefore, when only the output channel is split, the data duplication load F1 is zero.
计算方式2,当对输出宽度、高度均进行拆分时(也即是,对输入特征图的通道维度参数inf_c、高度维度参数inf_h以及宽度维度参数inf_w、输出特征图的通道维度参数ouf_c、高度维度参数ouf_h以及宽度维度参数ouf_w进行拆分,而权重通道参数wt_c、权重维度参数wt_n、权重高度参数wt_h以及权重宽度参数wt_w保持不变),对这个神经网络模型的多个卷积层进行权重的复用更优。与计算方式1类似,此时特征图需要加载多次,权重只需加载一次。但由于对宽度、高度进行了分割,根据卷积的扫描特性,输入特征图会引入固定的重复数据,进而这种情况下重复数据量F2可使用如下公式计算得到:Calculation method 2, when the output width and height are split (that is, the channel dimension parameter inf_c, height dimension parameter inf_h and width dimension parameter inf_w of the input feature map, channel dimension parameter ouf_c, height of the output feature map The dimension parameter ouf_h and the width dimension parameter ouf_w are split, while the weight channel parameter wt_c, weight dimension parameter wt_n, weight height parameter wt_h and weight width parameter wt_w remain unchanged) to weight the multiple convolutional layers of this neural network model. better reuse. Similar to calculation method 1, the feature maps need to be loaded multiple times, and the weights only need to be loaded once. However, due to the division of width and height, according to the scanning characteristics of convolution, the input feature map will introduce fixed repeated data, and in this case, the amount of repeated data F 2 can be calculated using the following formula:
其中,s为卷积步进参数,h_num为在输入特征图的高度(H)维度上分割的子块个数,w_num为在输入特征图宽度(W)维度上分割的子块个数。Among them, s is the convolution step parameter, h_num is the number of sub-blocks divided in the height (H) dimension of the input feature map, and w_num is the number of sub-blocks divided in the width (W) dimension of the input feature map.
计算方式3,当对输出宽度、高度、通道同时进行拆分时(也即是,对输入特征图的通道维度参数inf_c、高度维度参数inf_h以及宽度维度参数inf_w、输出特征图的通道维度参数ouf_c、高度维度参数ouf_h以及宽度维度参数ouf_w、权重通道参数wt_c、权重维度参数wt_n、权重高度参数wt_h以及权重宽度参数wt_w进行拆分),一个子特征图需要与多个权重子块进行卷积,一个权重子块也需要与多个子特征图进行卷积。此时,若复用特征图,则特征图只加载一次,还需计算因拆分导致的特征图重复加载量,并且权重需整体重复加载h_num*w_num-1次,故数据重复量计算公式为:Calculation method 3, when the output width, height and channel are split at the same time (that is, the channel dimension parameter inf_c, height dimension parameter inf_h and width dimension parameter inf_w of the input feature map, and the channel dimension parameter ouf_c of the output feature map. , height dimension parameter ouf_h and width dimension parameter ouf_w, weight channel parameter wt_c, weight dimension parameter wt_n, weight height parameter wt_h and weight width parameter wt_w), a sub-feature map needs to be convolved with multiple weight sub-blocks, A weight sub-block also needs to be convolved with multiple sub-feature maps. At this time, if the feature map is reused, the feature map is loaded only once, and the repeated loading amount of the feature map due to splitting needs to be calculated, and the weight needs to be repeated h_num*w_num-1 times as a whole, so the calculation formula of the data repetition amount is: :
F3=(h_num*w_num-1)*ouf_c*inf_c*wt_h*wt_w+F2;F 3 =(h_num*w_num-1)*ouf_c*inf_c*wt_h*wt_w+F 2 ;
若复用权重,则权重只加载一次,特征图需整体重复加载(wt_num-1)次,此时数据重复量计算公式为:If the weight is reused, the weight is loaded only once, and the feature map needs to be loaded repeatedly (wt_num-1) times. At this time, the calculation formula of the data repetition amount is:
F′3=(wt_num-1)*inf_c*inf_h*inf_w+wt_num*F2;F′ 3 =(wt_num-1)*inf_c*inf_h*inf_w+wt_num*F 2 ;
其中,wt_num为权重在N维度上拆分的子块个数,为拆分输入特征图W维度和H维度产生的重复数据。Among them, wt_num is the number of sub-blocks split by the weight in the N dimension, and is the repeated data generated by splitting the input feature map W dimension and H dimension.
S120B,将每个卷积层的可行特征图参数集合中数据重复加载量最小的可行特征图参数作为每个卷积层对应的目标特征图参数。S120B: Use the feasible feature map parameter with the smallest data repetition load in the feasible feature map parameter set of each convolution layer as the target feature map parameter corresponding to each convolution layer.
在另一些可能的实施例中,为了进一步提高本申请提供的方法的运行效率,神经网络模型包括多个卷积层,对于如何“从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数”,在图2的基础上,请参照图5,S120可以包括:In some other possible embodiments, in order to further improve the operation efficiency of the method provided by the present application, the neural network model includes multiple convolutional layers, and for how to "select each convolutional layer from the feasible feature map parameter set for each The convolution layer determines a corresponding target feature map parameter", on the basis of Figure 2, please refer to Figure 5, S120 may include:
S120a,根据目标卷积层的可行特征图参数集合中每个可行特征图参数的值,确定出每个可行特征图参数对应的数据重复加载量。S120a, according to the value of each feasible feature map parameter in the feasible feature map parameter set of the target convolutional layer, determine the data repetition loading amount corresponding to each feasible feature map parameter.
在本实施例中,对于如何“根据目标卷积层的可行特征图参数集合中每个可行特征图参数的值,确定出每个可行特征图参数对应的数据重复加载量”可以参照上述S120A,在此不再赘述。In this embodiment, for how to "determine the data repetition loading amount corresponding to each feasible feature map parameter according to the value of each feasible feature map parameter in the feasible feature map parameter set of the target convolutional layer", reference can be made to the above S120A, It is not repeated here.
S120b,根据预设的处理核心个数以及数据重复加载量,从目标卷积层的可行特征图参数集合中确定出一个最优特征图参数;目标卷积层为多个卷积层中的任一层,最优特征图参数为目标卷积层的可行特征图参数集合中数据搬运效率最高的可行特征图参数。S120b, according to the preset number of processing cores and the amount of repeated data loading, determine an optimal feature map parameter from the feasible feature map parameter set of the target convolutional layer; the target convolutional layer is any one of the multiple convolutional layers. One layer, the optimal feature map parameter is the feasible feature map parameter with the highest data handling efficiency in the feasible feature map parameter set of the target convolution layer.
在一些可能的实施例中,在统计完各种拆分尺寸组合的数据重复加载量F后,可以综合考虑数据重复加载量以及MAC资源利用率来确定数据搬运效率最高的可行特征图参数,即以数据重复加载量更少,MAC资源利用率更高为标准寻找最佳拆分组合。In some possible embodiments, after the data repetition loading amount F of various split size combinations is counted, the data repetition loading amount and the MAC resource utilization rate can be comprehensively considered to determine the feasible feature map parameters with the highest data handling efficiency, that is, Find the best split combination based on less data repetitive loading and higher MAC resource utilization.
硬件MAC资源利用率与特定的硬件布局相关,在硬件设计中,假设硬件的MAC资源被划分为Npe个组,每一组称为一个PE(逻辑核心),并且每一个PE计算一个输出通道的数据,因此当输出通道为Npe的倍数时效率最高,此时可以通过以下公式比较目标卷积层的可行特征图参数集合中任意两个可行特征图参数的数据搬运效率:The hardware MAC resource utilization is related to the specific hardware layout. In the hardware design, it is assumed that the hardware MAC resources are divided into Npe groups, each group is called a PE (logical core), and each PE calculates an output channel Therefore, when the output channel is a multiple of Npe , the efficiency is the highest. At this time, the data handling efficiency of any two feasible feature map parameters in the feasible feature map parameter set of the target convolution layer can be compared by the following formula:
其中,假设目标卷积层的可行特征图参数集合中任意两个可行特征图参数分别为gourp1和group2,公式中的Ts为搬运单位数据耗时,Tp为计算单个通道特定尺寸卷积耗时,Ts、Tp均可通过事先测试得到;s_ouf_c为原始输出通道个数;ouf_cgroup1、ouf_cgroup1分别为gourp1和group2的最大输出通道个数;ouf_clastgroup1、ouf_clastgroup2分别为分割产生的最后一个子块输出通道个数。运算分别为向下、向上取整。当公式的计算结果w大于零时,表示gourp2更优;反之,则gourp1更优。Among them, it is assumed that any two feasible feature map parameters in the feasible feature map parameter set of the target convolution layer are gourp1 and group2, respectively, T s in the formula is the time spent on transporting unit data, and T p is the calculation of a single channel-specific size convolution cost When , T s and T p can be obtained by testing in advance; s_ouf_c is the number of original output channels; ouf_c group1 and ouf_c group1 are the maximum number of output channels of gourp1 and group2 respectively; ouf_clast group1 and ouf_clast group2 are the final output channels generated by division respectively The number of output channels of a sub-block. operation They are rounded down and up, respectively. When the calculation result w of the formula is greater than zero, it means that gourp2 is better; otherwise, gourp1 is better.
进而,根据上述公式遍历比较目标卷积层的可行特征图参数集合中的任意两个可行特征图参数,即可从目标卷积层的可行特征图参数集合中确定出一个最优特征图参数。Furthermore, by traversing and comparing any two feasible feature map parameters in the feasible feature map parameter set of the target convolution layer according to the above formula, an optimal feature map parameter can be determined from the feasible feature map parameter set of the target convolution layer.
S120c,将最优特征图参数作为目标卷积层对应的目标特征图参数。S120c, taking the optimal feature map parameter as the target feature map parameter corresponding to the target convolution layer.
由于目标卷积层为多个卷积层中的任一层,为每个卷积层的可行特征图参数集合均进行S120a、S120b、S120c的处理,即可为每个卷积层确定出一个对应的数据搬运效率最高的目标特征图参数。Since the target convolutional layer is any one of multiple convolutional layers, S120a, S120b, and S120c are processed for the feasible feature map parameter sets of each convolutional layer, so that one convolutional layer can be determined for each convolutional layer. The corresponding target feature map parameters with the highest data handling efficiency.
请再参考图2,S130,根据每个卷积层对应的目标特征图参数为神经网络模型生成可执行文件。Please refer to FIG. 2 again, S130 , generate an executable file for the neural network model according to the target feature map parameters corresponding to each convolutional layer.
在一些可能的实施例中,对于如何“根据每个卷积层对应的目标特征图参数为神经网络模型生成可执行文件”,S130可以包括:根据每个卷积层对应的目标特征图参数生成三维直接内存存取DMA(Direct Memory Access,直接存储器访问,也即是直接内存存取)数据搬运指令,并根据三维DMA数据搬运指令生成神经网络模型的可执行文件。其中,三维直接内存存取DMA数据搬运指令可以理解为三维DMA数据搬运指令。In some possible embodiments, for how to "generate an executable file for the neural network model according to the target feature map parameters corresponding to each convolutional layer", S130 may include: generating according to the target feature map parameters corresponding to each convolutional layer Three-dimensional direct memory access DMA (Direct Memory Access, direct memory access, that is, direct memory access) data handling instructions, and generate an executable file of the neural network model according to the three-dimensional DMA data handling instructions. Among them, the three-dimensional direct memory access DMA data transfer instruction can be understood as a three-dimensional DMA data transfer instruction.
可以理解,利用三维DMA进行特征图的搬运,可直接在输出子特征图过程中就实现各个子块在三维逻辑尺寸上的无缝拼接,无需额外的slice/concate操作。It can be understood that by using 3D DMA to carry out feature map handling, the seamless splicing of each sub-block in the 3D logical size can be realized directly in the process of outputting the sub-feature map, without the need for additional slice/concate operations.
进一步的,为了实现神经网络模型的数据处理功能,在图2的基础上,请参照图6,在S130之后,方法还可以包括:Further, in order to realize the data processing function of the neural network model, on the basis of FIG. 2, please refer to FIG. 6, after S130, the method may further include:
S140,执行可执行文件以实现神经网络模型的数据处理功能。S140, execute the executable file to realize the data processing function of the neural network model.
在本实施例中,神经网络模型的数据处理功能可以是自然语言处理、图像识别、语音识别、数据挖掘、个性化推荐等功能。进而本申请可以在编译器中进行离线特征图拆分,可减少运行时的处理逻辑,降低硬件逻辑复杂度,降低实现成本。In this embodiment, the data processing function of the neural network model may be functions such as natural language processing, image recognition, speech recognition, data mining, and personalized recommendation. Furthermore, the present application can perform offline feature map splitting in the compiler, which can reduce processing logic at runtime, reduce hardware logic complexity, and reduce implementation costs.
应理解,通过本申请提供的神经网络模型的编译方法,可以对神经网络模型中每个卷积层的原始特征图参数进行拆分,为每个卷积层找到一个对应的数据搬运效率最高的目标特征图参数,进而提高神经网络模型整体的数据复用率,减少神经网络模型对应的可执行文件在运行时的运算量,并提高可执行文件的运算效率,即,本申请能够将神经网络模型编译为可执行文件,并减少该可执行文件在运行时的运算量,提高该可执行文件的运行效率。It should be understood that, through the compiling method of the neural network model provided by this application, the original feature map parameters of each convolutional layer in the neural network model can be split, and a corresponding data handling efficiency is found for each convolutional layer. The target feature map parameters, thereby improving the overall data reuse rate of the neural network model, reducing the amount of computation of the executable file corresponding to the neural network model during runtime, and improving the computational efficiency of the executable file. The model is compiled into an executable file, and the operation amount of the executable file is reduced, and the running efficiency of the executable file is improved.
为了执行上述实施例及各个可能的方式中的相应步骤,下面给出一种神经网络模型的编译装置的实现方式,请参阅图7,图7示出了本申请实施例提供的神经网络模型的编译装置的一种功能模块图。需要说明的是,本实施例所提供的神经网络模型的编译装置200,其基本原理及产生的技术效果和上述实施例相同,为简要描述,本实施例部分未提及之处,可参考上述的实施例中相应内容。该神经网络模型的编译装置200包括:获取模块210、拆分模块220、生成模块230、运行模块240。In order to perform the corresponding steps in the above-mentioned embodiment and each possible manner, an implementation manner of an apparatus for compiling a neural network model is given below. Please refer to FIG. 7 , which shows the neural network model provided by the embodiment of the present application. A functional block diagram of a compilation device. It should be noted that the basic principles and technical effects of the neural network
可选地,上述模块可以软件或固件(Firmware)的形式存储于存储器中或固化于本申请提供的电子设备100的操作系统(Operating System,OS)中,并可由电子设备100中的处理器执行。同时,执行上述模块所需的数据、程序的代码等可以存储在存储器中。Optionally, the above-mentioned modules may be stored in a memory in the form of software or firmware (Firmware) or solidified in an operating system (Operating System, OS) of the
获取模块210可以用于获取神经网络模型中每个卷积层的原始特征图参数。The obtaining
可以理解的是,获取模块210可以用于支持电子设备100执行上述S100等,和/或用于本文所描述的技术的其他过程。It can be understood that the obtaining
拆分模块220可以用于根据输入输出参数关系式、存储器容量对所述原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合。The
可以理解的是,拆分模块220可以用于支持电子设备100执行上述S110等,和/或用于本文所描述的技术的其他过程。It can be understood that the
拆分模块220还可以用于从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数;所述目标特征图参数对应的数据搬运效率最高。The
可以理解的是,拆分模块220可以用于支持电子设备100执行上述S120等,和/或用于本文所描述的技术的其他过程。It can be understood that the
生成模块230可以用于根据每个卷积层对应的目标特征图参数为所述神经网络模型生成可执行文件。The
可以理解的是,生成模块230可以用于支持电子设备100执行上述S130等,和/或用于本文所描述的技术的其他过程。It can be understood that the
运行模块240可以用于执行可执行文件以实现神经网络模型的数据处理功能。The running module 240 can be used to execute the executable file to realize the data processing function of the neural network model.
可以理解的是,运行模块240可以用于支持电子设备100执行上述S140等,和/或用于本文所描述的技术的其他过程。It can be understood that the operation module 240 may be used to support the
在一些可能的实施例中,本申请中的原始特征图参数可以包括原始输出特征图参数,对于如何“根据输入输出参数关系式、存储器容量对原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合”,拆分模块220可以用于根据输入输出参数关系式、存储器容量对原始输出特征图参数进行拆分,得到每个卷积层的可行输出特征图参数集合;拆分模块220还可以用于根据每个卷积层的可行输出特征图参数集合以及输入输出参数关系式,确定出每个卷积层的可行输入特征图参数集合;拆分模块220还可以用于根据每个卷积层的可行输出特征图参数集合以及可行输入特征图参数集合,确定每个卷积层的可行特征图参数集合。In some possible embodiments, the original feature map parameters in this application may include the original output feature map parameters. The
可以理解的是,拆分模块220可以用于支持电子设备100执行上述S110A、S110B、S110C等,和/或用于本文所描述的技术的其他过程。It will be appreciated that the
在一些可能的实施例中,对于如何“从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数”,拆分模块220可以用于根据每个卷积层的可行特征图参数集合中每个可行特征图参数的值,确定出每个所述可行特征图参数对应的数据重复加载量;拆分模块220还可以用于将每个卷积层的可行特征图参数集合中数据重复加载量最小的可行特征图参数作为每个卷积层对应的目标特征图参数。In some possible embodiments, regarding how to "determine a corresponding target feature map parameter for each convolutional layer from the set of feasible feature map parameters of each convolutional layer", the
可以理解的是,拆分模块220可以用于支持电子设备100执行上述S120A、S120B等,和/或用于本文所描述的技术的其他过程。It will be appreciated that the
在另一些可能的实施例中,为了进一步提高本申请提供的方法的运行效率,神经网络模型包括多个卷积层,对于如何“从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数”,拆分模块220可以用于根据目标卷积层的可行特征图参数集合中每个可行特征图参数的值,确定出每个可行特征图参数对应的数据重复加载量;拆分模块220还可以用于根据预设的处理核心个数以及数据重复加载量,从目标卷积层的可行特征图参数集合中确定出一个最优特征图参数;目标卷积层为多个卷积层中的任一层,最优特征图参数为目标卷积层的可行特征图参数集合中数据搬运效率最高的可行特征图参数;拆分模块220还可以用于将最优特征图参数作为目标卷积层对应的目标特征图参数。In some other possible embodiments, in order to further improve the operation efficiency of the method provided by the present application, the neural network model includes multiple convolutional layers, and for how to "select each convolutional layer from the feasible feature map parameter set for each The convolution layer determines a corresponding target feature map parameter", and the
可以理解的是,拆分模块220可以用于支持电子设备100执行上述S120a、S120b、S120c等,和/或用于本文所描述的技术的其他过程。It will be appreciated that the
基于上述方法实施例,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述神经网络模型的编译方法的步骤。Based on the foregoing method embodiments, embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor when the method for compiling the neural network model is executed. step.
具体地,该存储介质可以为通用的存储介质,如移动磁盘、硬盘等,该存储介质上的计算机程序被运行时,能够执行上述神经网络模型的编译方法,从而解决目前的优化方法较为简单,编译出的可执行文件的运行效率仍十分低下的问题,实现能够将神经网络模型编译为可执行文件,并减少该可执行文件在运行时的运算量,提高该可执行文件的运行效率的目的。Specifically, the storage medium can be a general storage medium, such as a removable disk, a hard disk, etc., when the computer program on the storage medium is run, it can execute the above-mentioned method for compiling the neural network model, so that the current optimization method is relatively simple. The problem that the running efficiency of the compiled executable file is still very low, to achieve the purpose of compiling the neural network model into an executable file, reducing the amount of operation of the executable file at runtime, and improving the running efficiency of the executable file .
综上,本申请实施例提供了一种神经网络模型的编译方法、装置、存储介质及电子设备,该方法包括:获取神经网络模型中每个卷积层的原始特征图参数;根据输入输出参数关系式、存储器容量对原始特征图参数进行拆分,得到每个卷积层的可行特征图参数集合;从每个卷积层的可行特征图参数集合中为每个卷积层确定出一个对应的目标特征图参数;目标特征图参数对应的数据搬运效率最高;根据每个卷积层对应的目标特征图参数为神经网络模型生成可执行文件。通过本申请提供的神经网络模型的编译方法,可以对神经网络模型中每个卷积层的原始特征图参数进行拆分,为每个卷积层找到一个对应的数据搬运效率最高的目标特征图参数,进而提高神经网络模型整体的数据复用率,减少神经网络模型对应的可执行文件在运行时的运算量,并提高可执行文件的运算效率,即,本申请能够将神经网络模型编译为可执行文件,并减少该可执行文件在运行时的运算量,提高该可执行文件的运行效率。In summary, the embodiments of the present application provide a method, device, storage medium, and electronic device for compiling a neural network model. The method includes: acquiring the original feature map parameters of each convolutional layer in the neural network model; The original feature map parameters are split by relational expression and memory capacity to obtain a set of feasible feature map parameters for each convolution layer; a corresponding set of parameters for each convolution layer is determined from the set of feasible feature map parameters for each convolution layer. The target feature map parameters of the target feature map; the data handling efficiency corresponding to the target feature map parameters is the highest; according to the target feature map parameters corresponding to each convolutional layer, an executable file is generated for the neural network model. Through the method for compiling a neural network model provided in this application, the original feature map parameters of each convolutional layer in the neural network model can be split, and a corresponding target feature map with the highest data handling efficiency can be found for each convolutional layer. parameters, thereby improving the overall data reuse rate of the neural network model, reducing the amount of computation of the executable file corresponding to the neural network model at runtime, and improving the computational efficiency of the executable file, that is, the present application can compile the neural network model into The executable file is reduced, and the amount of operation of the executable file is reduced, and the running efficiency of the executable file is improved.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application, All should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.
Claims (10)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010601610.2A CN111738423B (en) | 2020-06-28 | 2020-06-28 | Compilation method, device, storage medium and electronic device of neural network model |
PCT/CN2020/135681 WO2022001014A1 (en) | 2020-06-28 | 2020-12-11 | Neural network model compilation method and apparatus, storage medium, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010601610.2A CN111738423B (en) | 2020-06-28 | 2020-06-28 | Compilation method, device, storage medium and electronic device of neural network model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111738423A true CN111738423A (en) | 2020-10-02 |
CN111738423B CN111738423B (en) | 2024-11-22 |
Family
ID=72651518
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010601610.2A Active CN111738423B (en) | 2020-06-28 | 2020-06-28 | Compilation method, device, storage medium and electronic device of neural network model |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111738423B (en) |
WO (1) | WO2022001014A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022001014A1 (en) * | 2020-06-28 | 2022-01-06 | 湖南国科微电子股份有限公司 | Neural network model compilation method and apparatus, storage medium, and electronic device |
CN114239803A (en) * | 2021-12-13 | 2022-03-25 | 北京地平线机器人技术研发有限公司 | Method and device for compiling neural network model, electronic equipment and storage medium |
CN114625378A (en) * | 2022-03-28 | 2022-06-14 | 北京地平线机器人技术研发有限公司 | Method and device for compiling neural network model and computer readable storage medium |
CN116415103A (en) * | 2023-06-09 | 2023-07-11 | 之江实验室 | A data processing method, device, storage medium and electronic equipment |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115982110B (en) * | 2023-03-21 | 2023-08-29 | 北京探境科技有限公司 | File running method, file running device, computer equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650922A (en) * | 2016-09-29 | 2017-05-10 | 清华大学 | Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system |
US20190251424A1 (en) * | 2018-02-13 | 2019-08-15 | Beijing Kuangshi Technology Co., Ltd. | Operation apparatus, operation execution device and operation execution method |
CN110555516A (en) * | 2019-08-27 | 2019-12-10 | 上海交通大学 | FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method |
CN110929860A (en) * | 2019-11-07 | 2020-03-27 | 深圳云天励飞技术有限公司 | Convolution acceleration operation method and device, storage medium and terminal equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738423B (en) * | 2020-06-28 | 2024-11-22 | 湖南国科微电子股份有限公司 | Compilation method, device, storage medium and electronic device of neural network model |
-
2020
- 2020-06-28 CN CN202010601610.2A patent/CN111738423B/en active Active
- 2020-12-11 WO PCT/CN2020/135681 patent/WO2022001014A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650922A (en) * | 2016-09-29 | 2017-05-10 | 清华大学 | Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system |
US20190251424A1 (en) * | 2018-02-13 | 2019-08-15 | Beijing Kuangshi Technology Co., Ltd. | Operation apparatus, operation execution device and operation execution method |
CN110555516A (en) * | 2019-08-27 | 2019-12-10 | 上海交通大学 | FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method |
CN110929860A (en) * | 2019-11-07 | 2020-03-27 | 深圳云天励飞技术有限公司 | Convolution acceleration operation method and device, storage medium and terminal equipment |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022001014A1 (en) * | 2020-06-28 | 2022-01-06 | 湖南国科微电子股份有限公司 | Neural network model compilation method and apparatus, storage medium, and electronic device |
CN114239803A (en) * | 2021-12-13 | 2022-03-25 | 北京地平线机器人技术研发有限公司 | Method and device for compiling neural network model, electronic equipment and storage medium |
CN114625378A (en) * | 2022-03-28 | 2022-06-14 | 北京地平线机器人技术研发有限公司 | Method and device for compiling neural network model and computer readable storage medium |
CN116415103A (en) * | 2023-06-09 | 2023-07-11 | 之江实验室 | A data processing method, device, storage medium and electronic equipment |
CN116415103B (en) * | 2023-06-09 | 2023-09-05 | 之江实验室 | A data processing method, device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
WO2022001014A1 (en) | 2022-01-06 |
CN111738423B (en) | 2024-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022001014A1 (en) | Neural network model compilation method and apparatus, storage medium, and electronic device | |
CN109086031B (en) | Business decision method and device based on rule engine | |
US11468301B2 (en) | Method and apparatus for performing operation of convolutional layer in convolutional neural network | |
CN111738424B (en) | Neural network processing method and device, electronic equipment and storage medium | |
US11704553B2 (en) | Neural network system for single processing common operation group of neural network models, application processor including the same, and operation method of neural network system | |
CN110704037B (en) | Rule engine implementation method and device | |
JP2019082996A (en) | Method and apparatus for performing operations in convolutional neural network, and non-temporary storage medium | |
WO2017116924A1 (en) | Neural network training performance optimization framework | |
US20220058450A1 (en) | Tabular convolution and acceleration | |
CN111311599B (en) | Image processing method, device, electronic equipment and storage medium | |
US11983567B2 (en) | Processing data stream modification to reduce power effects during parallel processing | |
CN110633785A (en) | A computing method and system for a convolutional neural network | |
CN112989270B (en) | A convolution computing device based on hybrid parallelism | |
CN114741389A (en) | Model parameter adjusting method and device, electronic equipment and storage medium | |
CN114117992A (en) | Serialization and deserialization method and device and electronic equipment | |
US8984475B2 (en) | Apparatus and method for generating code overlay | |
WO2023222047A1 (en) | Processing method and processing unit for neural network computing graph, and device and medium | |
Kim et al. | Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing | |
CN111340215B (en) | Network model reasoning acceleration method and device, storage medium and intelligent equipment | |
CN115423089A (en) | Compiler optimization method, device and equipment based on tensor data calculation inference | |
CN109388428B (en) | Layer traversal method, control device and data processing system | |
CN117610634B (en) | Data migration method, device, server and storage medium | |
CN118627554B (en) | Parallel convolution method, acceleration device and computer readable storage medium | |
CN116301903B (en) | Compiler, AI network compiling method, processing method and executing system | |
CN118503205B (en) | Method and apparatus for processing tensor data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |