CN109934336A

CN109934336A - Design method of neural network dynamic acceleration platform based on optimal structure search and neural network dynamic acceleration platform

Info

Publication number: CN109934336A
Application number: CN201910175975.0A
Authority: CN
Inventors: 虞致国; 马晓杰; 顾晓峰; 魏敬和
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2019-03-08
Filing date: 2019-03-08
Publication date: 2019-06-25
Anticipated expiration: 2039-03-08
Also published as: CN109934336B

Abstract

The invention provides a method for designing a neural network dynamic acceleration platform based on optimal structure search. The neural network dynamic acceleration platform includes: a control terminal and a hardware acceleration terminal; the control terminal is used for training the control neural network, and the control neural network is based on the sub-neural network. Feedback inference accuracy and preset accuracy requirements, update the structure of the sub-neural network and generate the sub-neural network structure parameters, retrain the updated sub-neural network to generate weight parameters, and generate a configuration file to send at the same time To the hardware acceleration end, the configuration file contains the structural parameters and weight parameters of the sub-neural network; when the hardware acceleration end returns the inference accuracy of the sub-neural network to be stable, the optimal structure of the sub-neural network on the hardware acceleration end is searched; The sub-neural network is a neural network that needs to perform inference acceleration at the hardware acceleration end; the present invention can dynamically search for the optimal structure of the sub-neural network that needs to perform hardware inference acceleration.

Description

Design method and neural network dynamic acceleration platform based on optimal structure search Network Dynamic Acceleration Platform

技术领域technical field

本发明涉及智能平台计算领域，尤其涉及一种基于最优网络结构搜索的神经网络动态加速平台设计方法。The invention relates to the field of intelligent platform computing, in particular to a neural network dynamic acceleration platform design method based on optimal network structure search.

背景技术Background technique

深度神经网络(DNN)已然展现出巨大价值，在深度神经网络中，卷积神经网络(CNN)相比于传统的图像识别方案有着明显的优势。随着人们要求的提高，网络层数的加深和数据库不断地增大成为CNN的主流发展技术线路。与此同时，对于深度神经网络的运用也面临着几个问题：Deep Neural Networks (DNNs) have shown great value, and among DNNs, Convolutional Neural Networks (CNNs) have obvious advantages over traditional image recognition schemes. With the improvement of people's requirements, the deepening of the number of network layers and the continuous increase of the database have become the mainstream development technology lines of CNN. At the same time, the application of deep neural networks also faces several problems:

(1)训练一个卷积神经网络将花费更多的时间，CNN算法主要通过大量的乘法运算来实现卷积，1998年LeCun等提出的运用于识别手写字体的CNN模型只有少于2.3×10⁷次乘法运算，2012年Krizhevsky等设计出的名为AlexNet的CNN模型的乘法运算次数达到了1.1×10⁹次，而在2015年Simonyan和Zisserman提出的CNN模型所需的乘法运算次数甚至超过了1.6×10¹⁰次。(1) It will take more time to train a convolutional neural network. The CNN algorithm mainly realizes convolution through a large number of multiplication operations. The CNN model proposed by LeCun et al. in 1998 for recognizing handwritten fonts is only less than 2.3×10 ⁷ The number of multiplication operations of the CNN model named AlexNet designed by Krizhevsky et al in 2012 reached 1.1×10 ⁹ times, while the number of multiplication operations required by the CNN model proposed by Simonyan and Zisserman in 2015 even exceeded 1.6 times. ×10 ¹⁰ times.

(2)大型深度神经网络运行时会消耗相当大的功耗，并且在通用处理器中运行效率低下。因为其模型必须存储在外部DRAM，且在在对图片或语音的推测过程中需要实时调用。下表显示了在45nm的CMOS处理器中做基本运算和存储过程的功耗。若无网络结构的优化以及硬件架构的优化，模型数据的存取及运算将占用大量功耗。特别对于嵌入式的移动端，这是不允许的。(2) Large deep neural networks consume considerable power when running and are inefficient in general-purpose processors. Because its model must be stored in external DRAM, and it needs to be called in real time during the inference of pictures or speech. The table below shows the power consumption for basic operations and storage in a 45nm CMOS processor. Without the optimization of the network structure and the optimization of the hardware structure, the access and operation of the model data will consume a lot of power consumption. Especially for embedded mobile, this is not allowed.

表1. 45nm CMOS处理器功耗表Table 1. 45nm CMOS processor power consumption table

针对以上需求和挑战，针对硬件资源、精度要求等条件，满足硬件计算效率高、性能功耗比高等要求，提供基于一种基于最优结构搜索的神经网络动态加速平台的设计方法是非常迫切的。In view of the above requirements and challenges, it is very urgent to provide a design method of a neural network dynamic acceleration platform based on optimal structure search based on conditions such as hardware resources and accuracy requirements, and to meet the requirements of high hardware computing efficiency and high performance-to-power ratio. .

发明内容SUMMARY OF THE INVENTION

本发明的目的是在于克服现有技术中存在的不足，提供一种基于最优结构搜索的神经网络动态加速平台设计方法以及神经网络动态加速平台，可动态地对需要进行硬件推论加速的子神经网络进行最优结构搜索，并完成子神经网络的推论加速过程。本发明采用的技术方案是：The purpose of the present invention is to overcome the deficiencies in the prior art, and to provide a neural network dynamic acceleration platform design method based on optimal structure search and a neural network dynamic acceleration platform, which can dynamically perform hardware inference acceleration for sub-neural acceleration. The network performs the optimal structure search and completes the inference acceleration process of the sub-neural network. The technical scheme adopted in the present invention is:

一种基于最优结构搜索的神经网络动态加速平台设计方法，包括：A neural network dynamic acceleration platform design method based on optimal structure search, comprising:

S1，神经网络动态加速平台的控制端上的控制神经网络根据子神经网络所反馈的推论准确率以及预先设置的准确率精度要求，对子神经网络的结构进行更新并生成子神经网络结构参数，同时控制端对更新过结构的子神经网络进行再训练，生成子神经网络的权重参数，最后生成存储器的地址及访问模式；根据子神经网络的结构参数、权重参数、存储器的地址及访问模式形成配置文件；S1, the control neural network on the control end of the neural network dynamic acceleration platform updates the structure of the sub-neural network and generates the sub-neural network structure parameters according to the inference accuracy fed back by the sub-neural network and the preset accuracy accuracy requirements. At the same time, the control terminal retrains the sub-neural network with the updated structure, generates the weight parameters of the sub-neural network, and finally generates the address and access mode of the memory; configuration file;

S2，将配置文件写入神经网络动态加速平台中硬件加速端的配置缓存模块，并根据配置文件中的卷积层层数、池化层层数、全连接层层数分别更新核心计算模块中卷积计算单元、池化单元、线性计算单元数量，根据配置文件中的代表各层连接方式的参数更新各单元的连接方式，根据卷积层与池化层的结构参数分别更新卷积计算单元、池化单元的结构；S2, write the configuration file into the configuration cache module of the hardware acceleration side in the neural network dynamic acceleration platform, and update the volume in the core computing module according to the number of convolution layers, pooling layers, and fully connected layers in the configuration file. The number of product calculation units, pooling units, and linear calculation units, update the connection mode of each unit according to the parameters representing the connection mode of each layer in the configuration file, and update the convolution calculation unit, respectively, according to the structural parameters of the convolution layer and the pooling layer. The structure of the pooling unit;

S3，通过数据输入单元读取输入的数据，并通过卷积计算单元将输入数据进行卷积运算，之后通过线性运算单元、池化单元和分类单元得出分类结果，同时计算子神经网络的推论准确率，最后通过数据输出单元输出分类结果和推论准确率；S3, read the input data through the data input unit, and perform the convolution operation on the input data through the convolution calculation unit, and then obtain the classification result through the linear operation unit, the pooling unit and the classification unit, and calculate the inference of the sub-neural network at the same time Accuracy, and finally output the classification results and inference accuracy through the data output unit;

S4，将子神经网络的推论准确率再次反馈给控制端，控制神经网络通过子神经网络的推论准确率和预先设置的准确率精度要求再次对子神经网络经行更新，并生成新的子神经网络结构参数，同时控制端对更新过结构的子神经网络进行再训练，生成子神经网络的权重参数，最后生成存储器的地址及访问模式；根据子神经网络的结构参数、权重参数、存储器的地址及访问模式形成配置文件；S4, the inference accuracy of the sub-neural network is fed back to the control terminal again, and the control neural network updates the sub-neural network again through the inference accuracy of the sub-neural network and the preset accuracy accuracy, and generates a new sub-neural At the same time, the control terminal retrains the sub-neural network whose structure has been updated, generates the weight parameters of the sub-neural network, and finally generates the address and access mode of the memory; and access mode to form a configuration file;

S5，如此反复迭代，当硬件加速端返回的子神经网络准确率保持稳定以后，控制神经网络不再进行子神经网络的结构更新，子神经网络的结构参数保持不变；当控制神经网络不再更新，即搜索到硬件加速端上的子神经网络的最优结构。S5, iterates in this way, when the accuracy rate of the sub-neural network returned by the hardware acceleration terminal remains stable, the control neural network will no longer update the structure of the sub-neural network, and the structural parameters of the sub-neural network remain unchanged; Update, that is, search for the optimal structure of the sub-neural network on the hardware acceleration side.

一种神经网络动态加速平台，包括：控制端和硬件加速端；A neural network dynamic acceleration platform, comprising: a control terminal and a hardware acceleration terminal;

控制端用于训练控制神经网络，控制神经网络根据子神经网络所反馈的推论准确率以及预先设置的准确率精度要求，对子神经网络的结构进行更新并生成子神经网络结构参数，并对更新后的子神经网络进行再训练生成权重参数，同时生成配置文件发送至硬件加速端，配置文件包含子神经网络的结构参数与权重参数；当硬件加速端返回子神经网络推论准确率稳定以后，控制神经网络不再进行子神经网络的更新，即搜索到硬件加速端上的子神经网络的最优结构；The control terminal is used to train the control neural network. The control neural network updates the structure of the sub-neural network and generates the structural parameters of the sub-neural network according to the inference accuracy fed back by the sub-neural network and the preset accuracy requirements. The subsequent sub-neural network is retrained to generate weight parameters, and a configuration file is generated and sent to the hardware acceleration terminal. The configuration file contains the structural parameters and weight parameters of the sub-neural network; when the hardware acceleration terminal returns to the sub-neural network inference accuracy After the inference accuracy is stable, control The neural network no longer updates the sub-neural network, that is, the optimal structure of the sub-neural network on the hardware acceleration end is searched;

所述子神经网络为需要在硬件加速端进行推论加速的神经网络；The sub-neural network is a neural network that needs to perform inference acceleration at the hardware acceleration end;

所述硬件加速端为使用ASIC实现的子神经网络硬件推论加速器，其接受控制端生成的配置文件，完成对子神经网络的硬件实现，并加速子神经网络的推论过程，同时把子神经网络的推论准确率反馈给控制端；最终达到对子神经网络最优结构搜索并完成最优结构的子神经网络的硬件推论加速过程。The hardware acceleration terminal is a sub-neural network hardware inference accelerator implemented by ASIC, which accepts the configuration file generated by the control terminal, completes the hardware implementation of the sub-neural network, and accelerates the inference process of the sub-neural network. The inference accuracy is fed back to the control terminal; finally, the hardware inference acceleration process of searching for the optimal structure of the sub-neural network and completing the sub-neural network with the optimal structure is achieved.

具体地，specifically,

子神经网络的结构包括卷积层层数、池化层层数、全连接层层数、卷积层的结构、池化层的结构以及各层的连接方式；卷积层结构包括卷积核数目、尺寸、步长，池化层结构包括池化窗尺寸、步长；The structure of the sub-neural network includes the number of convolutional layers, the number of pooling layers, the number of fully connected layers, the structure of the convolutional layer, the structure of the pooling layer and the connection method of each layer; the convolutional layer structure includes the convolution kernel. Number, size, step size, the pooling layer structure includes pooling window size and step size;

配置文件中的参数包括：Parameters in the configuration file include:

卷积层层数、池化层层数、全连接层层数、各层的连接方式；The number of convolutional layers, the number of pooling layers, the number of fully connected layers, and the connection method of each layer;

卷积层的结构：卷积核数目、尺寸、步长；The structure of the convolutional layer: the number of convolution kernels, the size, and the stride;

池化层的结构：池化窗尺寸、步长；The structure of the pooling layer: pooling window size, step size;

再训练过后的子神经网络的各权重参数；The weight parameters of the retrained sub-neural network;

存储器的地址、存储器访问模式；Memory address, memory access mode;

硬件加速端包括数据输入模块、配置缓存模块、核心计算模块、数据输出模块；其中核心计算模块包括卷积计算单元、池化单元、线性计算单元、分类单元；其中卷积计算单元、池化单元、线性计算单元的数量分别与配置文件中的卷积层层数、池化层层数、全连接层层数相对应；卷积计算单元的硬件结构由配置文件中的卷积层结构参数决定，池化单元的硬件结构由配置文件中的池化层结构参数决定；核心计算模块中各单元的硬件连接方式由配置文件中的各层的连接方式所决定；The hardware acceleration terminal includes a data input module, a configuration cache module, a core calculation module, and a data output module; the core calculation module includes a convolution calculation unit, a pooling unit, a linear calculation unit, and a classification unit; among which the convolution calculation unit, the pooling unit , The number of linear computing units corresponds to the number of convolutional layers, pooling layers, and fully connected layers in the configuration file respectively; the hardware structure of the convolutional computing unit is determined by the convolutional layer structure parameters in the configuration file. , the hardware structure of the pooling unit is determined by the pooling layer structure parameters in the configuration file; the hardware connection mode of each unit in the core computing module is determined by the connection mode of each layer in the configuration file;

所述数据输入模块用于根据输入数据通道数，将数据分为n个单通道数据块，所述配置缓存模块用于缓存控制端上生成的配置文件，所述核心计算模块使用卷积计算单元中的m个卷积核依次对n个数据块以步长k进行卷积，生成m个特征图，m,k分别代表卷积计算单元中卷积核数目、卷积步长，分别与配置文件中的卷积核数目、步长相对应，并传输给池化单元经行特征抽样，之后通过线性计算单元和分类单元产生分类结果，同时计算子神经网络的推论准确率，最后通过数据输出单元输出分类结果和推论准确率。The data input module is used to divide the data into n single-channel data blocks according to the number of input data channels, the configuration cache module is used to cache the configuration file generated on the control terminal, and the core calculation module uses a convolution calculation unit. The m convolution kernels in the n data blocks sequentially convolve the n data blocks with the step size k to generate m feature maps, where m and k represent the number of convolution kernels and the convolution step size in the convolution calculation unit respectively, which are respectively related to the configuration. The number of convolution kernels in the file corresponds to the step size, and is transmitted to the pooling unit for feature sampling, and then the classification result is generated by the linear calculation unit and the classification unit, and the inference accuracy of the sub-neural network is calculated at the same time, and finally through the data output unit Output classification results and inference accuracy.

本发明的优点在于：本发明提供的基于最优结构搜索的神经网络动态加速平台设计方法，能够在特定硬件资源，可变精度，外部参数改变的情况下动态地对硬件加速端上的子神经网络最优结构进行搜索，并对此最优结构的子神经网络推论加速；能够满足神经网络处理芯片高性能功耗比、低延时、可变精度的要求，解决了现有技术中处理器无法同时高效运用于多功能、多平台的问题。The advantages of the present invention are: the neural network dynamic acceleration platform design method based on the optimal structure search provided by the present invention can dynamically adjust the sub-neurons on the hardware acceleration end under the condition of specific hardware resources, variable precision, and external parameters change. The optimal structure of the network is searched, and the inference of the sub-neural network of this optimal structure is accelerated; it can meet the requirements of high-performance power consumption ratio, low-latency, and variable precision of neural network processing chips, and solves the problem of the processor in the prior art. It cannot be efficiently applied to multi-function and multi-platform problems at the same time.

附图说明Description of drawings

图1为本发明的基于最优结构搜索的神经网络动态加速平台的层级图。FIG. 1 is a hierarchical diagram of a neural network dynamic acceleration platform based on optimal structure search according to the present invention.

图2为本发明的神经网络动态加速平台结构组成示意图。FIG. 2 is a schematic diagram of the structure and composition of the neural network dynamic acceleration platform of the present invention.

图3为本发明的控制神经网络的计算流程示意图。FIG. 3 is a schematic diagram of the calculation flow of the control neural network of the present invention.

图4为本发明的卷积计算单元计算示意图。FIG. 4 is a schematic diagram of the calculation of the convolution calculation unit of the present invention.

图5为本发明的线性计算单元和分类单元示意图。FIG. 5 is a schematic diagram of a linear calculation unit and a classification unit of the present invention.

图6为本发明的非线性逼近法来近似实现Sigmoid函数示意图。FIG. 6 is a schematic diagram of approximately realizing the Sigmoid function by the nonlinear approximation method of the present invention.

具体实施方式Detailed ways

下面结合具体附图和实施例对本发明作进一步说明。The present invention will be further described below with reference to the specific drawings and embodiments.

图1显示了本发明提出的基于最优结构搜索的神经网络动态加速平台(以下简称加速平台)的层级图；包含控制层、应用层、连接层、硬件层；Fig. 1 shows the hierarchy diagram of the neural network dynamic acceleration platform (hereinafter referred to as the acceleration platform) based on the optimal structure search proposed by the present invention; it includes a control layer, an application layer, a connection layer, and a hardware layer;

控制层和应用层同属于软件层次，控制层完成控制神经网络的训练并搜索子神经网络的最优结构，同时完成最优结构的子神经网络的再训练；The control layer and the application layer both belong to the software layer. The control layer completes the training of the control neural network and searches for the optimal structure of the sub-neural network, and at the same time completes the retraining of the sub-neural network with the optimal structure;

对于应用层，用户通过调用支持的硬件编程接口来实现对硬件加速端的数据输入；For the application layer, the user realizes the data input to the hardware acceleration side by calling the supported hardware programming interface;

连接层提供传输由子神经网络结构参数、权重参数等所构成的配置文件和硬件加速端所反馈的子神经网络推论准确率；The connection layer provides the transmission of configuration files composed of sub-neural network structure parameters, weight parameters, etc. and the sub-neural network inference accuracy fed back by the hardware acceleration terminal;

硬件层主要提供子神经网络的推论加速功能，包括数据输入模块、配置缓存模块、核心计算模块、数据输出模块等模块；The hardware layer mainly provides the inference acceleration function of the sub-neural network, including data input module, configuration cache module, core computing module, data output module and other modules;

图2显示了本发明提出的基于最优结构搜索的神经网络动态加速平台的结构示意图，该加速平台包括控制端和硬件加速端；2 shows a schematic structural diagram of a neural network dynamic acceleration platform based on optimal structure search proposed by the present invention, and the acceleration platform includes a control terminal and a hardware acceleration terminal;

控制端为包含有图形处理器的服务器，用于训练控制神经网络，控制神经网络根据子神经网络所反馈的推论准确率以及预先设置的准确率精度要求，对子神经网络的结构进行更新并生成子神经网络结构参数，并对更新后的子神经网络进行再训练生成权重参数，同时生成配置文件发送至硬件加速端，配置文件包含子神经网络的结构参数与权重参数；当硬件加速端返回子神经网络推论准确率稳定以后，控制神经网络不再进行子神经网络的更新，即搜索到硬件加速端上的子神经网络的最优结构；The control terminal is a server containing a graphics processor, which is used to train the control neural network, and the control neural network updates the structure of the sub-neural network and generates the inference accuracy fed back by the sub-neural network and the preset accuracy requirements. Sub-neural network structure parameters, retrain the updated sub-neural network to generate weight parameters, and generate a configuration file and send it to the hardware acceleration terminal. The configuration file contains the structural parameters and weight parameters of the sub-neural network; when the hardware acceleration terminal returns the After the neural network inference accuracy is stable, the control neural network will no longer update the sub-neural network, that is, the optimal structure of the sub-neural network on the hardware acceleration end is searched;

所述子神经网络为需要在硬件加速端进行推论加速的神经网络；子神经网络的结构包括卷积层层数、池化层层数、全连接层层数、卷积层的结构、池化层的结构以及各层的连接方式；卷积层结构包括卷积核数目、尺寸、步长，池化层结构包括池化窗尺寸、步长；The sub-neural network is a neural network that needs to perform inference acceleration on the hardware acceleration side; the structure of the sub-neural network includes the number of convolutional layers, the number of pooling layers, the number of fully connected layers, the structure of the convolutional layer, the pooling The structure of the layer and the connection method of each layer; the structure of the convolution layer includes the number, size and step size of the convolution kernel, and the structure of the pooling layer includes the size and step size of the pooling window;

配置文件中的参数包括：Parameters in the configuration file include:

所述硬件加速端为使用ASIC实现的子神经网络硬件推论加速器，其接受控制端生成的配置文件，完成对子神经网络的硬件实现，并加速子神经网络的推论过程，同时把子神经网络的推论准确率反馈给控制端；最终达到对子神经网络最优结构搜索并完成最优结构的子神经网络的硬件推论加速过程；The hardware acceleration terminal is a sub-neural network hardware inference accelerator implemented by ASIC, which accepts the configuration file generated by the control terminal, completes the hardware implementation of the sub-neural network, and accelerates the inference process of the sub-neural network. The inference accuracy is fed back to the control terminal; finally, the hardware inference acceleration process of sub-neural network searching for the optimal structure of the sub-neural network and completing the sub-neural network with the optimal structure is achieved;

所述数据输入模块用于根据输入数据通道数，将数据分为n个单通道数据块，所述配置缓存模块用于缓存控制端上生成的配置文件，所述核心计算模块使用卷积计算单元中的m个卷积核依次对n个数据块以步长k进行卷积，生成m个特征图，m,k分别代表卷积计算单元中卷积核数目、卷积步长(分别与配置文件中的卷积核数目、步长相对应)，并传输给池化单元经行特征抽样，之后通过线性计算单元和分类单元产生分类结果，同时计算子神经网络的推论准确率，最后通过数据输出单元输出分类结果和推论准确率；The data input module is used to divide the data into n single-channel data blocks according to the number of input data channels, the configuration cache module is used to cache the configuration file generated on the control terminal, and the core calculation module uses a convolution calculation unit. The m convolution kernels in the n data blocks sequentially convolve the n data blocks with the step size k to generate m feature maps, where m and k represent the number of convolution kernels and the convolution step size in the convolution calculation unit respectively (respectively with the configuration The number of convolution kernels in the file, the step size corresponds), and transmitted to the pooling unit for feature sampling, and then the classification results are generated by the linear calculation unit and the classification unit, and the inference accuracy of the sub-neural network is calculated at the same time, and finally output through the data The unit outputs classification results and inference accuracy;

本发明提出的基于最优结构搜索的神经网络动态加速平台设计方法，主要流程如下：The neural network dynamic acceleration platform design method based on optimal structure search proposed by the present invention, the main process is as follows:

通过此种方法能够动态地对硬件加速端的子神经网络进行最优结构搜索，并完成最优结构的子神经网络的推论过程加速。This method can dynamically search for the optimal structure of the sub-neural network on the hardware acceleration side, and complete the acceleration of the inference process of the sub-neural network with the optimal structure.

具体而言，in particular,

在控制端，控制端的控制神经网络为递归神经网络，可生成配置文件；On the control side, the control neural network of the control side is a recurrent neural network, which can generate configuration files;

递归神经网络为一树状结构，如图3所示，具体计算流程如下：The recurrent neural network is a tree structure, as shown in Figure 3, and the specific calculation process is as follows:

A1)输入x_t和h_t-1分两路计算,一路对x_t和h_t-1相乘生成本级记忆单元c_t,其中x_t为子神经网络反馈的推论准确率和准确率精度要求所合成的序列，h_t-1为前一次控制神经网络输出的子神经网络结构参数；A1) The input x _t and h _t-1 are calculated in two ways, and x _t and h _t-1 are multiplied all the way to generate the memory unit c _t of this level, where x _t is the inference accuracy and accuracy of the sub-neural network feedback The synthesized sequence is required, and h _t-1 is the sub-neural network structure parameter that previously controlled the output of the neural network;

A2)对上一步相乘结果使用激活函数ReLU进行激活，进行ReLU(h_t-1×x_t)操作；A2) Use the activation function ReLU to activate the multiplication result of the previous step, and perform the ReLU(h _t-1 ×x _t ) operation;

A3)另一路先对x_t和h_t-1相加并用激活函数tanh进行tanh(h_t-1+x_t)操作；A3) The other way firstly adds x _t and h _t-1 and performs tanh(h _t-1 +x _t ) operation with activation function tanh;

A4)上一步结果与前级记忆单元c_t-1相加；A4) The result of the previous step is added to the previous memory unit c _t-1 ;

A5)再次使用ReLU激活函数激活，进行ReLU(tanh(h_t-1+x_t)+c_t-1)操作；A5) Use the ReLU activation function to activate again, and perform the ReLU(tanh(h _t-1 +x _t )+c _t-1 ) operation;

A6)两路结果相乘后输入sigmoid函数，最后输出的结果为：A6) After multiplying the two results, input the sigmoid function, and the final output result is:

h_t＝sigmoid(ReLU(x_t×h_t-1)×ReLU(tanh(h_t-1+x_t)+c_t-1))h _t =sigmoid(ReLU(x _t ×h _t-1 )×ReLU(tanh(h _t-1 +x _t )+c _t-1 ))

h_t为代表控制神经网络输出的子神经网络的结构参数；h _t is the structural parameter representing the sub-neural network that controls the output of the neural network;

当硬件加速端返回的子神经网络准确率保持稳定不变以后，控制神经网络不再进行子神经网络的结构更新，子神经网络的结构参数保持不变；When the accuracy rate of the sub-neural network returned by the hardware acceleration terminal remains stable and unchanged, the control neural network does not update the structure of the sub-neural network, and the structural parameters of the sub-neural network remain unchanged;

在硬件加速端；On the hardware acceleration side;

数据输入模块用于根据输入数据通道数，将数据分为n个单通道数据块，所输入的数据为图像数据或信号数据转换成的矩阵；The data input module is used to divide the data into n single-channel data blocks according to the number of input data channels, and the input data is a matrix converted from image data or signal data;

配置缓存模块用于缓存控制端上生成的配置文件；并可由核心计算模块所读取；The configuration cache module is used to cache the configuration file generated on the control terminal; it can be read by the core computing module;

一个卷积计算单元执行一层卷积层计算；每层卷积层中的m个卷积核依次对n个数据块以步长k进行卷积，生成m个特征图，如此进行l次，其中n为输入数据通道数，l、m、k分别代表卷积计算单元个数，卷积计算单元中卷积核个数、卷积步长，与配置文件中的卷积层数、卷积核数、卷积步长相对应；如图4所示；A convolution calculation unit performs one-layer convolution layer calculation; m convolution kernels in each convolution layer convolve the n data blocks with stride k in turn to generate m feature maps, and do this for l times, Where n is the number of input data channels, l, m, and k respectively represent the number of convolution calculation units, the number of convolution kernels and convolution stride in the convolution calculation unit, and the number of convolution layers and convolution layers in the configuration file. The number of kernels and the convolution step size correspond; as shown in Figure 4;

一个池化单元执行一层池化层计算，接受来自卷积计算单元的数据并进行特征抽样，如此进行o次，其中o代表池化单元的数量，与配置文件中的池化层数相对应；A pooling unit performs a layer of pooling layer calculation, accepts data from the convolutional computing unit and performs feature sampling, and so on o times, where o represents the number of pooling units, corresponding to the number of pooling layers in the configuration file ;

一个线性计算单元执行一层全连接层计算，接受来自池化单元的数据，进行a＝F(Wi×c+b)运算，如此反复进行T次，输出交由分类单元得出分类结果，最终分类结果输出给数据输出模块；其中Wi为权重参数，T为线性计算单元的个数，与配置文件中的全连接层层数相对应。c为池化单元输出的数据，b为子神经网络中的偏置，a为全连接层的输出，F(*)代表激活函数，一般为Sigomid或ReLU函数；如图5所示；A linear computing unit performs a layer of fully connected layer calculation, accepts the data from the pooling unit, performs a=F(Wi×c+b) operation, and repeats this T times, the output is passed to the classification unit to obtain the classification result, and finally The classification result is output to the data output module; where Wi is the weight parameter, and T is the number of linear calculation units, which corresponds to the number of fully connected layers in the configuration file. c is the data output by the pooling unit, b is the bias in the sub-neural network, a is the output of the fully connected layer, and F(*) represents the activation function, generally a Sigomid or ReLU function; as shown in Figure 5;

本发明使用分段非线性逼近法来近似实现Sigmoid函数，当x在不同的区间，分别使用不同的三阶多项式y＝Ax³+Bx²+Cx+D对Sigmoid函数进行拟合处理，其中A,B,C,D为通过三阶多项式拟合Sigmoid函数的系数；其实现结构如图6所示，使用拟合的三阶多项式在硬件加速端中实现Sigmoid函数流程为：The present invention uses the piecewise nonlinear approximation method to approximate the realization of the Sigmoid function. When x is in different intervals, different third-order polynomials y=Ax ³ +Bx ² +Cx+D are used to fit the Sigmoid function, wherein A , B, C, and D are the coefficients of fitting the Sigmoid function through a third-order polynomial; its implementation structure is shown in Figure 6. The process of using the fitted third-order polynomial to implement the Sigmoid function in the hardware acceleration side is as follows:

(1)对于处于不同区间的x有不同系数；各分段区间下的系数A、B、C、D预先存储在硬件加速端的片上存储器中，通过输入x取出与之对应的系数A、B、C、D；(1) There are different coefficients for x in different intervals; the coefficients A, B, C, D under each segment interval are pre-stored in the on-chip memory of the hardware acceleration end, and the corresponding coefficients A, B, and D are taken out by inputting x. C, D;

(2)通过选择器进行选择输出，当输入的x为非负数时，输出Ax³+Bx²+Cx+D的结果；若输入的x为负数，输出1-(Ax³+Bx²+Cx+D)的结果；(2) Select the output through the selector. When the input x is a non-negative number, output the result of Ax ³ +Bx ² +Cx+D; if the input x is a negative number, output 1-(Ax ³ +Bx ² +Cx +D) result;

在硬件加速端进行子神经网络推论加速的工作过程中，数据输入模块以及配置缓存模块需要通过硬件加速端片上和片外的存储器经行数据存储，于是硬件加速端需要获取硬件加速端片上和片外存储器的地址；在本发明中，存储器的地址由控制端生成，由存储器的地址确定的存储器的访问模式则通过配置文件中配置信息确定，存储器访问模式包括主访问模式、数据访问模式和权重访问模式等；In the process of sub-neural network inference acceleration on the hardware acceleration side, the data input module and the configuration cache module need to store data through the hardware acceleration side on-chip and off-chip memory, so the hardware acceleration side needs to obtain the hardware acceleration side on-chip and on-chip. The address of the external memory; in the present invention, the address of the memory is generated by the control terminal, the access mode of the memory determined by the address of the memory is determined by the configuration information in the configuration file, and the memory access mode includes the main access mode, the data access mode and the weight. access mode, etc.;

其中，主访问模式用于片内存储器与片外存储器之间的数据交换，数据访问模式用于从片上存储器读取数据至数据输入模块以及将核心计算模块中的最终分类结果存储至存储器，权重访问模式用于从片上存储器读取权重参数数据；Among them, the main access mode is used for data exchange between the on-chip memory and the off-chip memory, the data access mode is used to read data from the on-chip memory to the data input module and store the final classification result in the core computing module to the memory, and the weight The access mode is used to read weight parameter data from on-chip memory;

最后所应说明的是，以上具体实施方式仅用以说明本发明的技术方案而非限制，尽管参照实例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本发明技术方案的精神和范围，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above specific embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to examples, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions without departing from the spirit and scope of the technical solutions of the present invention should be included in the scope of the claims of the present invention.

Claims

1. a neural network dynamic acceleration platform, is characterized in that, comprises: control terminal and hardware acceleration terminal;

The control terminal is used to train the control neural network. The control neural network updates the structure of the sub-neural network and generates the structural parameters of the sub-neural network according to the inference accuracy fed back by the sub-neural network and the preset accuracy requirements. The subsequent sub-neural network is retrained to generate weight parameters, and a configuration file is generated and sent to the hardware acceleration terminal. The configuration file contains the structural parameters and weight parameters of the sub-neural network; when the hardware acceleration terminal returns to the sub-neural network inference accuracy After the inference accuracy is stable, control The neural network no longer updates the sub-neural network, that is, the optimal structure of the sub-neural network on the hardware acceleration end is searched;

The sub-neural network is a neural network that needs to perform inference acceleration at the hardware acceleration end;

The hardware acceleration terminal is a sub-neural network hardware inference accelerator implemented by ASIC, which accepts the configuration file generated by the control terminal, completes the hardware implementation of the sub-neural network, and accelerates the inference process of the sub-neural network. The inference accuracy is fed back to the control terminal; finally, the hardware inference acceleration process of searching for the optimal structure of the sub-neural network and completing the sub-neural network with the optimal structure is achieved.

2. neural network dynamic acceleration platform as claimed in claim 1, is characterized in that,

The structure of the sub-neural network includes the number of convolutional layers, the number of pooling layers, the number of fully connected layers, the structure of the convolutional layer, the structure of the pooling layer and the connection method of each layer; the convolutional layer structure includes the convolution kernel. Number, size, step size, the pooling layer structure includes pooling window size and step size;

Parameters in the configuration file include:

The number of convolutional layers, the number of pooling layers, the number of fully connected layers, and the connection method of each layer;

The structure of the convolutional layer: the number of convolution kernels, the size, and the stride;

The structure of the pooling layer: pooling window size, step size;

The weight parameters of the retrained sub-neural network;

Memory address, memory access mode;

The hardware acceleration terminal includes a data input module, a configuration cache module, a core calculation module, and a data output module; the core calculation module includes a convolution calculation unit, a pooling unit, a linear calculation unit, and a classification unit; among which the convolution calculation unit, the pooling unit , The number of linear computing units corresponds to the number of convolutional layers, pooling layers, and fully connected layers in the configuration file respectively; the hardware structure of the convolutional computing unit is determined by the convolutional layer structure parameters in the configuration file. , the hardware structure of the pooling unit is determined by the pooling layer structure parameters in the configuration file; the hardware connection mode of each unit in the core computing module is determined by the connection mode of each layer in the configuration file;

The data input module is used to divide the data into n single-channel data blocks according to the number of input data channels, the configuration cache module is used to cache the configuration file generated on the control terminal, and the core calculation module uses a convolution calculation unit. The m convolution kernels in the n data blocks sequentially convolve the n data blocks with the step size k to generate m feature maps, where m and k represent the number of convolution kernels and the convolution step size in the convolution calculation unit respectively, which are respectively related to the configuration. The number of convolution kernels in the file corresponds to the step size, and is transmitted to the pooling unit for feature sampling, and then the classification result is generated by the linear calculation unit and the classification unit, and the inference accuracy of the sub-neural network is calculated at the same time, and finally through the data output unit Output classification results and inference accuracy.

3. neural network dynamic acceleration platform as claimed in claim 1, is characterized in that,

On the control side, the control neural network of the control side is a recurrent neural network, and the calculation process of the recurrent neural network is as follows:

A1) The input x _t and h _t-1 are calculated in two ways, and x _t and h _t-1 are multiplied all the way to generate the memory unit c _t of this level, where x _t is the inference accuracy and accuracy of the sub-neural network feedback The synthesized sequence is required, and h _t-1 is the sub-neural network structure parameter that previously controlled the output of the neural network;

A2) Use the activation function ReLU to activate the multiplication result of the previous step, and perform the ReLU (h _t-1 ×x _t ) operation;

A3) The other way is to add x _t and h _t-1 and use the activation function tanh to perform tanh(h _t-1 +x _t ) operation;

A4) The result of the previous step is added to the previous memory unit c _t-1 ;

A5) Use the ReLU activation function to activate again, and perform the ReLU(tanh(h _t-1 +x _t )+c _t-1 ) operation;

A6) After multiplying the two results, input the sigmoid function, and the final output result is:

h _t = sigmoid(ReLU(x _t ×h _t-1 )×ReLU(tanh(h _t-1 +x _t )+c _t-1 ))

h _t is the structural parameter representing the sub-neural network that controls the output of the neural network.

4. neural network dynamic acceleration platform as claimed in claim 2, is characterized in that,

On the hardware acceleration side;

The data input module is used to divide the data into n single-channel data blocks according to the number of input data channels;

The configuration cache module is used to cache the configuration file generated on the control terminal; and is read by the core computing module;

A convolution calculation unit performs one-layer convolution layer calculation; m convolution kernels in each convolution layer convolve the n data blocks with stride k in turn to generate m feature maps, and do this for l times, Where n is the number of input data channels, l, m, and k represent the number of convolution calculation units, respectively, the number of convolution kernels and convolution stride in the convolution calculation unit, which are respectively related to the number of convolution layers in the configuration file, The number of convolution kernels and the step size correspond;

A pooling unit performs a layer of pooling layer calculation, accepts data from the convolutional computing unit and performs feature sampling, and so on o times, where o represents the number of pooling units, corresponding to the number of pooling layers in the configuration file ;

A linear computing unit performs a layer of fully connected layer calculation, accepts the data from the pooling unit, performs a = F (Wi × c + b) operation, and repeats this for T times, the output is passed to the classification unit to obtain the classification result, and finally The classification result is output to the data output module; where Wi is the weight parameter, T is the number of linear computing units, corresponding to the number of fully connected layers in the configuration file, c is the data output by the pooling unit, and b is the sub-neural network The bias in , a is the output of the fully connected layer, and F(*) represents the activation function.

5. neural network dynamic acceleration platform as claimed in claim 4, is characterized in that,

F(*) is the Sigmoid or ReLU function; on the hardware acceleration side, the piecewise nonlinear approximation method is used to approximate the Sigmoid function. When x is in different intervals, different third-order polynomials are used respectively y=Ax ³ + Bx ² + Cx+D performs fitting processing on the Sigmoid function, where A, B, C, and D are the coefficients of fitting the Sigmoid function through a third-order polynomial; the process of implementing the Sigmoid function in the hardware acceleration side using the fitted third-order polynomial is as follows:

(1) There are different coefficients for x in different intervals; the coefficients A, B, C, and D in each subsection are pre-stored in the on-chip memory of the hardware acceleration side, and the corresponding coefficients A, B, and D are extracted by inputting x. C, D;

(2) Select the output through the selector. When the input x is a non-negative number, the output is Ax ³ +Bx ² +Cx+D; if the input x is a negative number, the output is 1-(Ax ³ +Bx ² +Cx +D).

6. A neural network dynamic acceleration platform design method based on optimal structure search is characterized in that, comprising:

S1, the control neural network on the control end of the neural network dynamic acceleration platform updates the structure of the sub-neural network and generates the sub-neural network structure parameters according to the inference accuracy fed back by the sub-neural network and the preset accuracy accuracy requirements. At the same time, the control terminal retrains the sub-neural network with the updated structure, generates the weight parameters of the sub-neural network, and finally generates the address and access mode of the memory; configuration file;

S2, write the configuration file into the configuration cache module of the hardware acceleration side in the neural network dynamic acceleration platform, and update the volume in the core computing module according to the number of convolution layers, pooling layers, and fully connected layers in the configuration file. The number of product calculation units, pooling units, and linear calculation units, update the connection mode of each unit according to the parameters representing the connection mode of each layer in the configuration file, and update the convolution calculation unit, respectively, according to the structural parameters of the convolution layer and the pooling layer. The structure of the pooling unit;

S3, read the input data through the data input unit, and perform the convolution operation on the input data through the convolution calculation unit, and then obtain the classification result through the linear operation unit, the pooling unit and the classification unit, and calculate the inference of the sub-neural network at the same time Accuracy, and finally output the classification results and inference accuracy through the data output unit;

S4, the inference accuracy of the sub-neural network is fed back to the control terminal again, and the control neural network updates the sub-neural network again through the inference accuracy of the sub-neural network and the preset accuracy accuracy, and generates a new sub-neural At the same time, the control terminal retrains the sub-neural network whose structure has been updated, generates the weight parameters of the sub-neural network, and finally generates the address and access mode of the memory; and access mode to form a configuration file;

S5, iterates in this way, when the accuracy rate of the sub-neural network returned by the hardware acceleration terminal remains stable, the control neural network will no longer update the structure of the sub-neural network, and the structural parameters of the sub-neural network remain unchanged; Update, that is, search for the optimal structure of the sub-neural network on the hardware acceleration side.