CN109934336B

CN109934336B - Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform

Info

Publication number: CN109934336B
Application number: CN201910175975.0A
Authority: CN
Inventors: 虞致国; 马晓杰; 顾晓峰; 魏敬和
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2019-03-08
Filing date: 2019-03-08
Publication date: 2023-05-16
Anticipated expiration: 2039-03-08
Also published as: CN109934336A

Abstract

The invention provides a neural network dynamic acceleration platform design method based on optimal structure search, the neural network dynamic acceleration platform comprises: the control end and the hardware acceleration end; the control end is used for training the control neural network, the control neural network updates the structure of the sub-neural network according to the inference accuracy fed back by the sub-neural network and the accuracy requirement of the preset accuracy, generates sub-neural network structure parameters, retrains the updated sub-neural network to generate weight parameters, and generates a configuration file to be sent to the hardware acceleration end, wherein the configuration file comprises the structure parameters and the weight parameters of the sub-neural network; after the hardware acceleration terminal returns to the sub-neural network, the inference accuracy is stable, namely the optimal structure of the sub-neural network on the hardware acceleration terminal is searched; the sub-neural network is a neural network which needs to be deduced and accelerated at a hardware acceleration end; the invention can dynamically search the optimal structure of the sub-neural network which needs to be subjected to hardware inference acceleration.

Description

Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform

Technical Field

The invention relates to the field of intelligent platform computing, in particular to a neural network dynamic acceleration platform design method based on optimal network structure searching.

Background

Deep Neural Networks (DNNs) have shown tremendous value, where Convolutional Neural Networks (CNNs) have significant advantages over traditional image recognition schemes. With the increasing demands of people, the deepening of network layers and the increasing of databases become the main development technology line of CNN. Meanwhile, there are several problems faced with the use of deep neural networks:

(1) Training a convolutional neural network will take more time, the CNN algorithm realizes convolution mainly through a large number of multiplication operations, and the CNN model for recognizing handwriting fonts proposed by LeCun et al in 1998 is less than 2.3×10 ⁷ The number of multiplication operations of the CNN model named AlexNet designed by Krizhevsky et al in 2012 reaches 1.1X10 ⁹ Second, the number of multiplications required by the CNN model proposed by Simonyan and Zisselman in 2015 even exceeds 1.6X10 ¹⁰ And twice.

(2) Large deep neural networks consume considerable power in operation and are inefficient in general purpose processors. Because its model must be stored in external DRAM and needs to be invoked in real time during the speculation of pictures or speech. The following table shows the power consumption for the basic operations and storage processes in a 45nm CMOS processor. If there is no optimization of the network structure and optimization of the hardware architecture, the access and operation of the model data will occupy a large amount of power consumption. This is not allowed, especially for embedded mobile terminals.

Table 1.45 nm CMOS processor power consumption table

Aiming at the requirements and challenges, and aiming at the conditions of hardware resources, precision requirements and the like, the design method for the neural network dynamic acceleration platform based on the optimal structure search is very urgent to meet the requirements of high hardware computing efficiency, high performance power consumption ratio and the like.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a neural network dynamic acceleration platform design method and a neural network dynamic acceleration platform based on optimal structure search, which can dynamically search an optimal structure of a sub-neural network needing hardware inference acceleration and complete the inference acceleration process of the sub-neural network. The technical scheme adopted by the invention is as follows:

a neural network dynamic acceleration platform design method based on optimal structure search comprises the following steps:

s1, a control neural network on a control end of a neural network dynamic acceleration platform updates a structure of a sub-neural network and generates sub-neural network structure parameters according to inference accuracy fed back by the sub-neural network and accuracy requirements of preset accuracy, and meanwhile, the control end retrains the sub-neural network with the updated structure to generate weight parameters of the sub-neural network and finally generates addresses and access modes of a memory; forming a configuration file according to the structural parameters, the weight parameters, the address of the memory and the access mode of the sub-neural network;

s2, writing a configuration file into a configuration buffer module of a hardware acceleration end in the dynamic acceleration platform of the nerve network, respectively updating the number of convolution calculation units, pooling units and linear calculation units in a core calculation module according to the number of convolution layers, the number of pooling layers and the number of full-connection layers in the configuration file, respectively updating the connection modes of the units according to parameters representing the connection modes of the layers in the configuration file, and respectively updating the structures of the convolution calculation units and the pooling units according to the structural parameters of the convolution layers and the pooling layers;

s3, reading input data through a data input unit, performing convolution operation on the input data through a convolution calculation unit, obtaining a classification result through a linear operation unit, a pooling unit and a classification unit, calculating the inference accuracy of the sub-neural network, and finally outputting the classification result and the inference accuracy through a data output unit;

s4, the inference accuracy of the sub-neural network is fed back to the control end again, the control neural network updates the sub-neural network again through the inference accuracy of the sub-neural network and the accuracy requirement of the preset accuracy, new sub-neural network structure parameters are generated, meanwhile, the control end retrains the sub-neural network with the updated structure, weight parameters of the sub-neural network are generated, and finally addresses and access modes of the memory are generated; forming a configuration file according to the structural parameters, the weight parameters, the address of the memory and the access mode of the sub-neural network;

s5, repeatedly iterating in this way, and controlling the neural network to not update the structure of the neural network after the accuracy of the neural network returned by the hardware acceleration end is kept stable, wherein the structural parameters of the neural network are kept unchanged; when the control neural network is not updated, the optimal structure of the sub-neural network on the hardware acceleration end is searched.

A neural network dynamic acceleration platform, comprising: the control end and the hardware acceleration end;

the control end is used for training the control neural network, the control neural network updates the structure of the sub-neural network according to the inference accuracy fed back by the sub-neural network and the accuracy requirement of the preset accuracy, generates sub-neural network structure parameters, retrains the updated sub-neural network to generate weight parameters, and generates a configuration file to be sent to the hardware acceleration end, wherein the configuration file comprises the structure parameters and the weight parameters of the sub-neural network; after the hardware acceleration terminal returns to the sub-neural network to deduce that the accuracy is stable, the control neural network does not update the sub-neural network any more, namely the optimal structure of the sub-neural network on the hardware acceleration terminal is searched;

the sub-neural network is a neural network which needs to be deduced and accelerated at a hardware acceleration end;

the hardware acceleration end is a sub-neural network hardware inference accelerator realized by using an ASIC, receives the configuration file generated by the control end, completes the hardware realization of the sub-neural network, accelerates the inference process of the sub-neural network, and feeds the inference accuracy of the sub-neural network back to the control end; finally, the hardware inference acceleration process of the sub-neural network with the optimal structure is achieved.

In particular, the method comprises the steps of,

the structure of the sub-neural network comprises the number of convolution layers, the number of pooling layers, the number of full-connection layers, the structure of convolution layers, the structure of pooling layers and the connection mode of each layer; the convolution layer structure comprises the number, the size and the step length of convolution kernels, and the pooling layer structure comprises the size and the step length of pooling windows;

the parameters in the configuration file include:

the number of convolution layers, the number of pooling layers, the number of full-connection layers and the connection mode of each layer;

structure of the convolution layer: the number, size and step size of convolution kernels;

structure of the pooling layer: pooling window size and step size;

each weight parameter of the trained sub-neural network;

an address of the memory, a memory access mode;

the hardware acceleration end comprises a data input module, a configuration buffer module, a core calculation module and a data output module; the core calculation module comprises a convolution calculation unit, a pooling unit, a linear calculation unit and a classification unit; the number of the convolution calculation units, the pooling units and the linear calculation units respectively correspond to the number of convolution layers, the number of pooling layers and the number of full connection layers in the configuration file; the hardware structure of the convolution computing unit is determined by the convolution layer structure parameters in the configuration file, and the hardware structure of the pooling unit is determined by the pooling layer structure parameters in the configuration file; the hardware connection mode of each unit in the core computing module is determined by the connection mode of each layer in the configuration file;

the data input module is used for dividing data into n single-channel data blocks according to the number of input data channels, the configuration buffer module is used for buffering configuration files generated on a control end, the core calculation module uses m convolution kernels in the convolution calculation unit to sequentially convolve the n data blocks with a step length k to generate m feature graphs, m and k respectively represent the number of the convolution kernels and the convolution step length in the convolution calculation unit and respectively correspond to the number of the convolution kernels and the step length in the configuration files, the convolution kernels and the step length are transmitted to the pooling unit for feature sampling, classification results are generated through the linear calculation unit and the classification unit, meanwhile, the inference accuracy of the sub-neural network is calculated, and finally the classification results and the inference accuracy are output through the data output unit.

The invention has the advantages that: the neural network dynamic acceleration platform design method based on the optimal structure search provided by the invention can dynamically search the optimal structure of the sub-neural network on the hardware acceleration end under the conditions of specific hardware resources, variable precision and external parameter change, and deduce and accelerate the sub-neural network of the optimal structure; the method can meet the requirements of the neural network processing chip on high performance power consumption ratio, low delay and variable precision, and solves the problem that the processor in the prior art cannot be simultaneously and efficiently applied to multiple functions and multiple platforms.

Drawings

Fig. 1 is a hierarchical diagram of a neural network dynamic acceleration platform based on optimal structure search according to the present invention.

Fig. 2 is a schematic diagram of the structure of the neural network dynamic acceleration platform according to the present invention.

Fig. 3 is a schematic diagram of a calculation flow of the control neural network according to the present invention.

Fig. 4 is a schematic diagram of the convolution calculation unit according to the present invention.

Fig. 5 is a schematic diagram of a linear computing unit and a classifying unit according to the present invention.

Fig. 6 is a schematic diagram of a non-linear approximation of the present invention to approximate Sigmoid functions.

Detailed Description

The invention will be further described with reference to the following specific drawings and examples.

Fig. 1 shows a hierarchical diagram of a neural network dynamic acceleration platform (hereinafter referred to as acceleration platform) based on optimal structure search according to the present invention; the system comprises a control layer, an application layer, a connection layer and a hardware layer;

the control layer and the application layer belong to a software layer, the control layer finishes training of the control neural network and searches the optimal structure of the sub-neural network, and simultaneously finishes retraining of the sub-neural network with the optimal structure;

for an application layer, a user realizes data input of a hardware acceleration end by calling a supported hardware programming interface;

the connection layer provides the configuration file formed by the structural parameters of the sub-neural network, the weight parameters and the like and the deducing accuracy rate of the sub-neural network fed back by the hardware acceleration end;

the hardware layer mainly provides a deduction accelerating function of the sub-neural network and comprises a data input module, a configuration buffer module, a core computing module, a data output module and other modules;

FIG. 2 shows a schematic structural diagram of a neural network dynamic acceleration platform based on optimal structure search, which is provided by the invention, wherein the acceleration platform comprises a control end and a hardware acceleration end;

the control end is a server comprising a graphic processor and is used for training the control neural network, the control neural network updates the structure of the sub-neural network and generates sub-neural network structure parameters according to the inference accuracy fed back by the sub-neural network and the accuracy requirement set in advance, the updated sub-neural network is retrained to generate weight parameters, and meanwhile a configuration file is generated and sent to the hardware acceleration end, and the configuration file comprises the structure parameters and the weight parameters of the sub-neural network; after the hardware acceleration terminal returns to the sub-neural network to deduce that the accuracy is stable, the control neural network does not update the sub-neural network any more, namely the optimal structure of the sub-neural network on the hardware acceleration terminal is searched;

the sub-neural network is a neural network which needs to be deduced and accelerated at a hardware acceleration end; the structure of the sub-neural network comprises the number of convolution layers, the number of pooling layers, the number of full-connection layers, the structure of convolution layers, the structure of pooling layers and the connection mode of each layer; the convolution layer structure comprises the number, the size and the step length of convolution kernels, and the pooling layer structure comprises the size and the step length of pooling windows;

the parameters in the configuration file include:

structure of the pooling layer: pooling window size and step size;

each weight parameter of the trained sub-neural network;

an address of the memory, a memory access mode;

the hardware acceleration end is a sub-neural network hardware inference accelerator realized by using an ASIC, receives the configuration file generated by the control end, completes the hardware realization of the sub-neural network, accelerates the inference process of the sub-neural network, and feeds the inference accuracy of the sub-neural network back to the control end; finally, the hardware inference acceleration process of the sub-neural network of the optimal structure is achieved;

the data input module is used for dividing data into n single-channel data blocks according to the number of input data channels, the configuration buffer module is used for buffering a configuration file generated on a control end, the core calculation module uses m convolution kernels in a convolution calculation unit to sequentially convolve the n data blocks with a step length k to generate m feature graphs, m and k respectively represent the number of the convolution kernels in the convolution calculation unit and the convolution step length (respectively correspond to the number of the convolution kernels in the configuration file and the step length), the feature samples are transmitted to a pooling unit and subjected to feature sampling, classification results are generated through a linear calculation unit and a classification unit, meanwhile, the inference accuracy of a sub-neural network is calculated, and finally the classification results and the inference accuracy are output through a data output unit;

the invention provides a neural network dynamic acceleration platform design method based on optimal structure search, which mainly comprises the following steps:

By the method, the optimal structure search can be dynamically carried out on the sub-neural network of the hardware acceleration terminal, and the inference process acceleration of the sub-neural network of the optimal structure is completed.

In particular, the method comprises the steps of,

at the control end, the control neural network of the control end is a recurrent neural network, and a configuration file can be generated;

the recurrent neural network is a tree structure, as shown in fig. 3, and the specific calculation flow is as follows:

a1 Input x) _t And h _t-1 The calculation is divided into two paths, one path is opposite to x _t And h _t-1 Multiplication generates the current level memory cell c _t Wherein x is _t A sequence synthesized for deducing accuracy and accuracy precision requirements of the accuracy of the sub-neural network feedback, h _t-1 The structural parameters of the sub-neural network output by the neural network are controlled for the previous time;

a2 Activating the result of the previous multiplication by using an activation function ReLU to perform ReLU (h) _t-1 ×x _t ) Operating;

a3 For x in another path _t And h _t-1 Adding and performing tanh (h _t-1 +x _t ) Operating;

a4 A previous step result and a previous stage memory unit c _t-1 Adding;

a5 Activated again using the ReLU activation function, performing a ReLU (tanh (h) _t-1 +x _t )+c _t-1 ) Operating;

a6 After multiplying the two paths of results, inputting a sigmoid function, and finally outputting the following results:

h _t ＝sigmoid(ReLU(x _t ×h _t-1 )×ReLU(tanh(h _t-1 +x _t )+c _t-1 ))

h _t a structural parameter representing a sub-neural network controlling the output of the neural network;

after the accuracy of the sub-neural network returned by the hardware acceleration end is kept stable and unchanged, the control neural network does not update the structure of the sub-neural network, and the structural parameters of the sub-neural network are kept unchanged;

at the hardware acceleration end;

the data input module is used for dividing data into n single-channel data blocks according to the number of input data channels, and the input data is matrix converted from image data or signal data;

the configuration caching module is used for caching the configuration file generated on the control end; and is readable by the core computing module;

a convolution calculating unit performs a layer of convolution layer calculation; the method comprises the steps that m convolution kernels in each layer of convolution layer sequentially convolve n data blocks with a step length k to generate m feature graphs, and the m feature graphs are performed for l times, wherein n is the number of input data channels, l, m and k respectively represent the number of convolution calculation units, and the number of convolution kernels and the convolution step length in the convolution calculation units correspond to the number of convolution layers, the number of convolution kernels and the convolution step length in a configuration file; as shown in fig. 4;

a pooling unit performs a layer of pooling layer calculation, receives the data from the convolution calculation unit and performs feature sampling, so that o times, wherein o represents the number of pooling units and corresponds to the pooling layer number in the configuration file;

a linear calculation unit executes a layer of full-connection layer calculation, receives data from the pooling unit, performs a=f (wi×c+b) operation, repeatedly performs T times in this way, outputs a classification result obtained by the delivery classification unit, and finally outputs the classification result to the data output module; wi is a weight parameter, T is the number of linear computing units, and corresponds to the number of all connection layers in the configuration file. c is data output by the pooling unit, b is bias in the sub-neural network, a is output of the full-connection layer, and F (x) represents an activation function, typically a Sigomid or ReLU function; as shown in fig. 5;

the invention uses piecewise nonlinear approximation to approximate the Sigmoid function, when x is in different intervals, different third-order polynomials y=ax are respectively used ³ +Bx ² Carrying out fitting treatment on the Sigmoid function by +Cx+D, wherein A, B, C and D are coefficients of fitting the Sigmoid function through a third-order polynomial; the implementation structure is shown in fig. 6, and the implementation of the Sigmoid function flow in the hardware acceleration end by using the fitted third-order polynomial is as follows:

(1) Different coefficients for x at different intervals; coefficients A, B, C, D under each segment interval are stored in an on-chip memory of a hardware acceleration end in advance, and the corresponding coefficients A, B, C, D are fetched through input x;

(2) Selection by selectorSelecting output, outputting Ax when the input x is non-negative ³ +Bx ² Results of +cx+d; if the input x is negative, output 1- (Ax) ³ +Bx ² +cx+d);

in the working process of carrying out inference acceleration on the sub-neural network by the hardware acceleration terminal, the data input module and the configuration buffer module need to be subjected to data storage through memories on the hardware acceleration terminal and outside the chip, so that the hardware acceleration terminal needs to acquire addresses of the memories on the hardware acceleration terminal and outside the chip; in the invention, the address of the memory is generated by the control end, the access mode of the memory determined by the address of the memory is determined by configuration information in the configuration file, and the memory access mode comprises a main access mode, a data access mode, a weight access mode and the like;

the main access mode is used for data exchange between the on-chip memory and the off-chip memory, the data access mode is used for reading data from the on-chip memory to the data input module and storing the final classification result in the core calculation module to the memory, and the weight access mode is used for reading weight parameter data from the on-chip memory;

finally, it should be noted that the above-mentioned embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention, and all such modifications and equivalents are intended to be encompassed in the scope of the claims of the present invention.

Claims

1. A neural network dynamic acceleration platform, comprising: the control end and the hardware acceleration end;

the parameters in the configuration file include:

structure of the pooling layer: pooling window size and step size;

each weight parameter of the trained sub-neural network;

an address of the memory, a memory access mode;

the data input module is used for dividing data into n single-channel data blocks according to the number of input data channels, the configuration buffer module is used for buffering configuration files generated on a control end, the core calculation module uses m convolution kernels in a convolution calculation unit to sequentially convolve the n data blocks with a step length k to generate m feature graphs, m and k respectively represent the number of the convolution kernels and the convolution step length in the convolution calculation unit and respectively correspond to the number of the convolution kernels and the step length in the configuration files, the convolution kernels and the step length are transmitted to a pooling unit for feature sampling, classification results are generated through a linear calculation unit and a classification unit, meanwhile, the inference accuracy of a sub-neural network is calculated, and finally the classification results and the inference accuracy are output through a data output unit;

at the control end, the control neural network of the control end is a recurrent neural network, and the calculation flow of the recurrent neural network is as follows:

a4 A previous step result and a previous stage memory unit c _t-1 Adding;

h _t ＝sigmoid(ReLU(x _t ×h _t-1 )×ReLU(tanh(h _t-1 +x _t )+c _t-1 ))

h _t is a structural parameter representing a sub-neural network that controls the output of the neural network.

2. The neural network dynamic acceleration platform of claim 1,

at the hardware acceleration end;

the data input module is used for dividing data into n single-channel data blocks according to the number of input data channels;

the configuration caching module is used for caching the configuration file generated on the control end; and read by the core computing module;

a convolution calculating unit performs a layer of convolution layer calculation; the method comprises the steps that m convolution kernels in each layer of convolution layer sequentially convolve n data blocks with a step length k to generate m feature graphs, and the m feature graphs are performed for l times, wherein n is the number of input data channels, l, m and k respectively represent the number of convolution calculation units, and the number of convolution kernels and the convolution step length in the convolution calculation units respectively correspond to the number of convolution layer layers, the number of convolution kernels and the step length in a configuration file;

a pooling unit performs a layer of pooling layer calculation, receives the data from the convolution calculation unit and performs feature sampling, so as to perform o times, wherein o represents the number of pooling units and corresponds to the number of pooling layers in the configuration file;

a linear calculation unit executes a layer of full-connection layer calculation, receives data from the pooling unit, performs a=f (wi×c+b) operation, repeatedly performs T times in this way, outputs a classification result obtained by the delivery classification unit, and finally outputs the classification result to the data output module; wi is a weight parameter, T is the number of linear computing units, the number corresponds to the number of all connection layers in a configuration file, c is data output by a pooling unit, b is bias in a sub-neural network, a is output of all connection layers, and F represents an activation function.

3. The neural network dynamic acceleration platform of claim 2,

f is Sigomid or ReLU function; at the hardware acceleration end, a piecewise nonlinear approximation method is used for realizing the Sigmoid function approximately, and when x is in different intervals, different third-order polynomials y=ax are respectively used ³ +Bx ² Carrying out fitting treatment on the Sigmoid function by +Cx+D, wherein A, B, C and D are coefficients of fitting the Sigmoid function through a third-order polynomial; the fitted third-order polynomial is used for realizing the Sigmoid function flow in the hardware acceleration end, and the flow is as follows:

(2) Selecting the output by the selector, when the input x is a non-negative number, the output is Ax ³ +Bx ² +cx+d; if the input x is a negative number, the output is 1- (Ax) ³ +Bx ² +Cx+D)。