CN109934336B - Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform - Google Patents

Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform Download PDF

Info

Publication number
CN109934336B
CN109934336B CN201910175975.0A CN201910175975A CN109934336B CN 109934336 B CN109934336 B CN 109934336B CN 201910175975 A CN201910175975 A CN 201910175975A CN 109934336 B CN109934336 B CN 109934336B
Authority
CN
China
Prior art keywords
neural network
sub
convolution
pooling
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910175975.0A
Other languages
Chinese (zh)
Other versions
CN109934336A (en
Inventor
虞致国
马晓杰
顾晓峰
魏敬和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201910175975.0A priority Critical patent/CN109934336B/en
Publication of CN109934336A publication Critical patent/CN109934336A/en
Application granted granted Critical
Publication of CN109934336B publication Critical patent/CN109934336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a neural network dynamic acceleration platform design method based on optimal structure search, the neural network dynamic acceleration platform comprises: the control end and the hardware acceleration end; the control end is used for training the control neural network, the control neural network updates the structure of the sub-neural network according to the inference accuracy fed back by the sub-neural network and the accuracy requirement of the preset accuracy, generates sub-neural network structure parameters, retrains the updated sub-neural network to generate weight parameters, and generates a configuration file to be sent to the hardware acceleration end, wherein the configuration file comprises the structure parameters and the weight parameters of the sub-neural network; after the hardware acceleration terminal returns to the sub-neural network, the inference accuracy is stable, namely the optimal structure of the sub-neural network on the hardware acceleration terminal is searched; the sub-neural network is a neural network which needs to be deduced and accelerated at a hardware acceleration end; the invention can dynamically search the optimal structure of the sub-neural network which needs to be subjected to hardware inference acceleration.

Description

Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform
Technical Field
The invention relates to the field of intelligent platform computing, in particular to a neural network dynamic acceleration platform design method based on optimal network structure searching.
Background
Deep Neural Networks (DNNs) have shown tremendous value, where Convolutional Neural Networks (CNNs) have significant advantages over traditional image recognition schemes. With the increasing demands of people, the deepening of network layers and the increasing of databases become the main development technology line of CNN. Meanwhile, there are several problems faced with the use of deep neural networks:
(1) Training a convolutional neural network will take more time, the CNN algorithm realizes convolution mainly through a large number of multiplication operations, and the CNN model for recognizing handwriting fonts proposed by LeCun et al in 1998 is less than 2.3×10 7 The number of multiplication operations of the CNN model named AlexNet designed by Krizhevsky et al in 2012 reaches 1.1X10 9 Second, the number of multiplications required by the CNN model proposed by Simonyan and Zisselman in 2015 even exceeds 1.6X10 10 And twice.
(2) Large deep neural networks consume considerable power in operation and are inefficient in general purpose processors. Because its model must be stored in external DRAM and needs to be invoked in real time during the speculation of pictures or speech. The following table shows the power consumption for the basic operations and storage processes in a 45nm CMOS processor. If there is no optimization of the network structure and optimization of the hardware architecture, the access and operation of the model data will occupy a large amount of power consumption. This is not allowed, especially for embedded mobile terminals.
Table 1.45 nm CMOS processor power consumption table
Figure BDA0001989630520000011
Aiming at the requirements and challenges, and aiming at the conditions of hardware resources, precision requirements and the like, the design method for the neural network dynamic acceleration platform based on the optimal structure search is very urgent to meet the requirements of high hardware computing efficiency, high performance power consumption ratio and the like.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a neural network dynamic acceleration platform design method and a neural network dynamic acceleration platform based on optimal structure search, which can dynamically search an optimal structure of a sub-neural network needing hardware inference acceleration and complete the inference acceleration process of the sub-neural network. The technical scheme adopted by the invention is as follows:
a neural network dynamic acceleration platform design method based on optimal structure search comprises the following steps:
s1, a control neural network on a control end of a neural network dynamic acceleration platform updates a structure of a sub-neural network and generates sub-neural network structure parameters according to inference accuracy fed back by the sub-neural network and accuracy requirements of preset accuracy, and meanwhile, the control end retrains the sub-neural network with the updated structure to generate weight parameters of the sub-neural network and finally generates addresses and access modes of a memory; forming a configuration file according to the structural parameters, the weight parameters, the address of the memory and the access mode of the sub-neural network;
s2, writing a configuration file into a configuration buffer module of a hardware acceleration end in the dynamic acceleration platform of the nerve network, respectively updating the number of convolution calculation units, pooling units and linear calculation units in a core calculation module according to the number of convolution layers, the number of pooling layers and the number of full-connection layers in the configuration file, respectively updating the connection modes of the units according to parameters representing the connection modes of the layers in the configuration file, and respectively updating the structures of the convolution calculation units and the pooling units according to the structural parameters of the convolution layers and the pooling layers;
s3, reading input data through a data input unit, performing convolution operation on the input data through a convolution calculation unit, obtaining a classification result through a linear operation unit, a pooling unit and a classification unit, calculating the inference accuracy of the sub-neural network, and finally outputting the classification result and the inference accuracy through a data output unit;
s4, the inference accuracy of the sub-neural network is fed back to the control end again, the control neural network updates the sub-neural network again through the inference accuracy of the sub-neural network and the accuracy requirement of the preset accuracy, new sub-neural network structure parameters are generated, meanwhile, the control end retrains the sub-neural network with the updated structure, weight parameters of the sub-neural network are generated, and finally addresses and access modes of the memory are generated; forming a configuration file according to the structural parameters, the weight parameters, the address of the memory and the access mode of the sub-neural network;
s5, repeatedly iterating in this way, and controlling the neural network to not update the structure of the neural network after the accuracy of the neural network returned by the hardware acceleration end is kept stable, wherein the structural parameters of the neural network are kept unchanged; when the control neural network is not updated, the optimal structure of the sub-neural network on the hardware acceleration end is searched.
A neural network dynamic acceleration platform, comprising: the control end and the hardware acceleration end;
the control end is used for training the control neural network, the control neural network updates the structure of the sub-neural network according to the inference accuracy fed back by the sub-neural network and the accuracy requirement of the preset accuracy, generates sub-neural network structure parameters, retrains the updated sub-neural network to generate weight parameters, and generates a configuration file to be sent to the hardware acceleration end, wherein the configuration file comprises the structure parameters and the weight parameters of the sub-neural network; after the hardware acceleration terminal returns to the sub-neural network to deduce that the accuracy is stable, the control neural network does not update the sub-neural network any more, namely the optimal structure of the sub-neural network on the hardware acceleration terminal is searched;
the sub-neural network is a neural network which needs to be deduced and accelerated at a hardware acceleration end;
the hardware acceleration end is a sub-neural network hardware inference accelerator realized by using an ASIC, receives the configuration file generated by the control end, completes the hardware realization of the sub-neural network, accelerates the inference process of the sub-neural network, and feeds the inference accuracy of the sub-neural network back to the control end; finally, the hardware inference acceleration process of the sub-neural network with the optimal structure is achieved.
In particular, the method comprises the steps of,
the structure of the sub-neural network comprises the number of convolution layers, the number of pooling layers, the number of full-connection layers, the structure of convolution layers, the structure of pooling layers and the connection mode of each layer; the convolution layer structure comprises the number, the size and the step length of convolution kernels, and the pooling layer structure comprises the size and the step length of pooling windows;
the parameters in the configuration file include:
the number of convolution layers, the number of pooling layers, the number of full-connection layers and the connection mode of each layer;
structure of the convolution layer: the number, size and step size of convolution kernels;
structure of the pooling layer: pooling window size and step size;
each weight parameter of the trained sub-neural network;
an address of the memory, a memory access mode;
the hardware acceleration end comprises a data input module, a configuration buffer module, a core calculation module and a data output module; the core calculation module comprises a convolution calculation unit, a pooling unit, a linear calculation unit and a classification unit; the number of the convolution calculation units, the pooling units and the linear calculation units respectively correspond to the number of convolution layers, the number of pooling layers and the number of full connection layers in the configuration file; the hardware structure of the convolution computing unit is determined by the convolution layer structure parameters in the configuration file, and the hardware structure of the pooling unit is determined by the pooling layer structure parameters in the configuration file; the hardware connection mode of each unit in the core computing module is determined by the connection mode of each layer in the configuration file;
the data input module is used for dividing data into n single-channel data blocks according to the number of input data channels, the configuration buffer module is used for buffering configuration files generated on a control end, the core calculation module uses m convolution kernels in the convolution calculation unit to sequentially convolve the n data blocks with a step length k to generate m feature graphs, m and k respectively represent the number of the convolution kernels and the convolution step length in the convolution calculation unit and respectively correspond to the number of the convolution kernels and the step length in the configuration files, the convolution kernels and the step length are transmitted to the pooling unit for feature sampling, classification results are generated through the linear calculation unit and the classification unit, meanwhile, the inference accuracy of the sub-neural network is calculated, and finally the classification results and the inference accuracy are output through the data output unit.
The invention has the advantages that: the neural network dynamic acceleration platform design method based on the optimal structure search provided by the invention can dynamically search the optimal structure of the sub-neural network on the hardware acceleration end under the conditions of specific hardware resources, variable precision and external parameter change, and deduce and accelerate the sub-neural network of the optimal structure; the method can meet the requirements of the neural network processing chip on high performance power consumption ratio, low delay and variable precision, and solves the problem that the processor in the prior art cannot be simultaneously and efficiently applied to multiple functions and multiple platforms.
Drawings
Fig. 1 is a hierarchical diagram of a neural network dynamic acceleration platform based on optimal structure search according to the present invention.
Fig. 2 is a schematic diagram of the structure of the neural network dynamic acceleration platform according to the present invention.
Fig. 3 is a schematic diagram of a calculation flow of the control neural network according to the present invention.
Fig. 4 is a schematic diagram of the convolution calculation unit according to the present invention.
Fig. 5 is a schematic diagram of a linear computing unit and a classifying unit according to the present invention.
Fig. 6 is a schematic diagram of a non-linear approximation of the present invention to approximate Sigmoid functions.
Detailed Description
The invention will be further described with reference to the following specific drawings and examples.
Fig. 1 shows a hierarchical diagram of a neural network dynamic acceleration platform (hereinafter referred to as acceleration platform) based on optimal structure search according to the present invention; the system comprises a control layer, an application layer, a connection layer and a hardware layer;
the control layer and the application layer belong to a software layer, the control layer finishes training of the control neural network and searches the optimal structure of the sub-neural network, and simultaneously finishes retraining of the sub-neural network with the optimal structure;
for an application layer, a user realizes data input of a hardware acceleration end by calling a supported hardware programming interface;
the connection layer provides the configuration file formed by the structural parameters of the sub-neural network, the weight parameters and the like and the deducing accuracy rate of the sub-neural network fed back by the hardware acceleration end;
the hardware layer mainly provides a deduction accelerating function of the sub-neural network and comprises a data input module, a configuration buffer module, a core computing module, a data output module and other modules;
FIG. 2 shows a schematic structural diagram of a neural network dynamic acceleration platform based on optimal structure search, which is provided by the invention, wherein the acceleration platform comprises a control end and a hardware acceleration end;
the control end is a server comprising a graphic processor and is used for training the control neural network, the control neural network updates the structure of the sub-neural network and generates sub-neural network structure parameters according to the inference accuracy fed back by the sub-neural network and the accuracy requirement set in advance, the updated sub-neural network is retrained to generate weight parameters, and meanwhile a configuration file is generated and sent to the hardware acceleration end, and the configuration file comprises the structure parameters and the weight parameters of the sub-neural network; after the hardware acceleration terminal returns to the sub-neural network to deduce that the accuracy is stable, the control neural network does not update the sub-neural network any more, namely the optimal structure of the sub-neural network on the hardware acceleration terminal is searched;
the sub-neural network is a neural network which needs to be deduced and accelerated at a hardware acceleration end; the structure of the sub-neural network comprises the number of convolution layers, the number of pooling layers, the number of full-connection layers, the structure of convolution layers, the structure of pooling layers and the connection mode of each layer; the convolution layer structure comprises the number, the size and the step length of convolution kernels, and the pooling layer structure comprises the size and the step length of pooling windows;
the parameters in the configuration file include:
the number of convolution layers, the number of pooling layers, the number of full-connection layers and the connection mode of each layer;
structure of the convolution layer: the number, size and step size of convolution kernels;
structure of the pooling layer: pooling window size and step size;
each weight parameter of the trained sub-neural network;
an address of the memory, a memory access mode;
the hardware acceleration end is a sub-neural network hardware inference accelerator realized by using an ASIC, receives the configuration file generated by the control end, completes the hardware realization of the sub-neural network, accelerates the inference process of the sub-neural network, and feeds the inference accuracy of the sub-neural network back to the control end; finally, the hardware inference acceleration process of the sub-neural network of the optimal structure is achieved;
the hardware acceleration end comprises a data input module, a configuration buffer module, a core calculation module and a data output module; the core calculation module comprises a convolution calculation unit, a pooling unit, a linear calculation unit and a classification unit; the number of the convolution calculation units, the pooling units and the linear calculation units respectively correspond to the number of convolution layers, the number of pooling layers and the number of full connection layers in the configuration file; the hardware structure of the convolution computing unit is determined by the convolution layer structure parameters in the configuration file, and the hardware structure of the pooling unit is determined by the pooling layer structure parameters in the configuration file; the hardware connection mode of each unit in the core computing module is determined by the connection mode of each layer in the configuration file;
the data input module is used for dividing data into n single-channel data blocks according to the number of input data channels, the configuration buffer module is used for buffering a configuration file generated on a control end, the core calculation module uses m convolution kernels in a convolution calculation unit to sequentially convolve the n data blocks with a step length k to generate m feature graphs, m and k respectively represent the number of the convolution kernels in the convolution calculation unit and the convolution step length (respectively correspond to the number of the convolution kernels in the configuration file and the step length), the feature samples are transmitted to a pooling unit and subjected to feature sampling, classification results are generated through a linear calculation unit and a classification unit, meanwhile, the inference accuracy of a sub-neural network is calculated, and finally the classification results and the inference accuracy are output through a data output unit;
the invention provides a neural network dynamic acceleration platform design method based on optimal structure search, which mainly comprises the following steps:
s1, a control neural network on a control end of a neural network dynamic acceleration platform updates a structure of a sub-neural network and generates sub-neural network structure parameters according to inference accuracy fed back by the sub-neural network and accuracy requirements of preset accuracy, and meanwhile, the control end retrains the sub-neural network with the updated structure to generate weight parameters of the sub-neural network and finally generates addresses and access modes of a memory; forming a configuration file according to the structural parameters, the weight parameters, the address of the memory and the access mode of the sub-neural network;
s2, writing a configuration file into a configuration buffer module of a hardware acceleration end in the dynamic acceleration platform of the nerve network, respectively updating the number of convolution calculation units, pooling units and linear calculation units in a core calculation module according to the number of convolution layers, the number of pooling layers and the number of full-connection layers in the configuration file, respectively updating the connection modes of the units according to parameters representing the connection modes of the layers in the configuration file, and respectively updating the structures of the convolution calculation units and the pooling units according to the structural parameters of the convolution layers and the pooling layers;
s3, reading input data through a data input unit, performing convolution operation on the input data through a convolution calculation unit, obtaining a classification result through a linear operation unit, a pooling unit and a classification unit, calculating the inference accuracy of the sub-neural network, and finally outputting the classification result and the inference accuracy through a data output unit;
s4, the inference accuracy of the sub-neural network is fed back to the control end again, the control neural network updates the sub-neural network again through the inference accuracy of the sub-neural network and the accuracy requirement of the preset accuracy, new sub-neural network structure parameters are generated, meanwhile, the control end retrains the sub-neural network with the updated structure, weight parameters of the sub-neural network are generated, and finally addresses and access modes of the memory are generated; forming a configuration file according to the structural parameters, the weight parameters, the address of the memory and the access mode of the sub-neural network;
s5, repeatedly iterating in this way, and controlling the neural network to not update the structure of the neural network after the accuracy of the neural network returned by the hardware acceleration end is kept stable, wherein the structural parameters of the neural network are kept unchanged; when the control neural network is not updated, the optimal structure of the sub-neural network on the hardware acceleration end is searched.
By the method, the optimal structure search can be dynamically carried out on the sub-neural network of the hardware acceleration terminal, and the inference process acceleration of the sub-neural network of the optimal structure is completed.
In particular, the method comprises the steps of,
at the control end, the control neural network of the control end is a recurrent neural network, and a configuration file can be generated;
the recurrent neural network is a tree structure, as shown in fig. 3, and the specific calculation flow is as follows:
a1 Input x) t And h t-1 The calculation is divided into two paths, one path is opposite to x t And h t-1 Multiplication generates the current level memory cell c t Wherein x is t A sequence synthesized for deducing accuracy and accuracy precision requirements of the accuracy of the sub-neural network feedback, h t-1 The structural parameters of the sub-neural network output by the neural network are controlled for the previous time;
a2 Activating the result of the previous multiplication by using an activation function ReLU to perform ReLU (h) t-1 ×x t ) Operating;
a3 For x in another path t And h t-1 Adding and performing tanh (h t-1 +x t ) Operating;
a4 A previous step result and a previous stage memory unit c t-1 Adding;
a5 Activated again using the ReLU activation function, performing a ReLU (tanh (h) t-1 +x t )+c t-1 ) Operating;
a6 After multiplying the two paths of results, inputting a sigmoid function, and finally outputting the following results:
h t =sigmoid(ReLU(x t ×h t-1 )×ReLU(tanh(h t-1 +x t )+c t-1 ))
h t a structural parameter representing a sub-neural network controlling the output of the neural network;
after the accuracy of the sub-neural network returned by the hardware acceleration end is kept stable and unchanged, the control neural network does not update the structure of the sub-neural network, and the structural parameters of the sub-neural network are kept unchanged;
at the hardware acceleration end;
the data input module is used for dividing data into n single-channel data blocks according to the number of input data channels, and the input data is matrix converted from image data or signal data;
the configuration caching module is used for caching the configuration file generated on the control end; and is readable by the core computing module;
a convolution calculating unit performs a layer of convolution layer calculation; the method comprises the steps that m convolution kernels in each layer of convolution layer sequentially convolve n data blocks with a step length k to generate m feature graphs, and the m feature graphs are performed for l times, wherein n is the number of input data channels, l, m and k respectively represent the number of convolution calculation units, and the number of convolution kernels and the convolution step length in the convolution calculation units correspond to the number of convolution layers, the number of convolution kernels and the convolution step length in a configuration file; as shown in fig. 4;
a pooling unit performs a layer of pooling layer calculation, receives the data from the convolution calculation unit and performs feature sampling, so that o times, wherein o represents the number of pooling units and corresponds to the pooling layer number in the configuration file;
a linear calculation unit executes a layer of full-connection layer calculation, receives data from the pooling unit, performs a=f (wi×c+b) operation, repeatedly performs T times in this way, outputs a classification result obtained by the delivery classification unit, and finally outputs the classification result to the data output module; wi is a weight parameter, T is the number of linear computing units, and corresponds to the number of all connection layers in the configuration file. c is data output by the pooling unit, b is bias in the sub-neural network, a is output of the full-connection layer, and F (x) represents an activation function, typically a Sigomid or ReLU function; as shown in fig. 5;
the invention uses piecewise nonlinear approximation to approximate the Sigmoid function, when x is in different intervals, different third-order polynomials y=ax are respectively used 3 +Bx 2 Carrying out fitting treatment on the Sigmoid function by +Cx+D, wherein A, B, C and D are coefficients of fitting the Sigmoid function through a third-order polynomial; the implementation structure is shown in fig. 6, and the implementation of the Sigmoid function flow in the hardware acceleration end by using the fitted third-order polynomial is as follows:
(1) Different coefficients for x at different intervals; coefficients A, B, C, D under each segment interval are stored in an on-chip memory of a hardware acceleration end in advance, and the corresponding coefficients A, B, C, D are fetched through input x;
(2) Selection by selectorSelecting output, outputting Ax when the input x is non-negative 3 +Bx 2 Results of +cx+d; if the input x is negative, output 1- (Ax) 3 +Bx 2 +cx+d);
in the working process of carrying out inference acceleration on the sub-neural network by the hardware acceleration terminal, the data input module and the configuration buffer module need to be subjected to data storage through memories on the hardware acceleration terminal and outside the chip, so that the hardware acceleration terminal needs to acquire addresses of the memories on the hardware acceleration terminal and outside the chip; in the invention, the address of the memory is generated by the control end, the access mode of the memory determined by the address of the memory is determined by configuration information in the configuration file, and the memory access mode comprises a main access mode, a data access mode, a weight access mode and the like;
the main access mode is used for data exchange between the on-chip memory and the off-chip memory, the data access mode is used for reading data from the on-chip memory to the data input module and storing the final classification result in the core calculation module to the memory, and the weight access mode is used for reading weight parameter data from the on-chip memory;
finally, it should be noted that the above-mentioned embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention, and all such modifications and equivalents are intended to be encompassed in the scope of the claims of the present invention.

Claims (3)

1. A neural network dynamic acceleration platform, comprising: the control end and the hardware acceleration end;
the control end is used for training the control neural network, the control neural network updates the structure of the sub-neural network according to the inference accuracy fed back by the sub-neural network and the accuracy requirement of the preset accuracy, generates sub-neural network structure parameters, retrains the updated sub-neural network to generate weight parameters, and generates a configuration file to be sent to the hardware acceleration end, wherein the configuration file comprises the structure parameters and the weight parameters of the sub-neural network; after the hardware acceleration terminal returns to the sub-neural network to deduce that the accuracy is stable, the control neural network does not update the sub-neural network any more, namely the optimal structure of the sub-neural network on the hardware acceleration terminal is searched;
the sub-neural network is a neural network which needs to be deduced and accelerated at a hardware acceleration end;
the hardware acceleration end is a sub-neural network hardware inference accelerator realized by using an ASIC, receives the configuration file generated by the control end, completes the hardware realization of the sub-neural network, accelerates the inference process of the sub-neural network, and feeds the inference accuracy of the sub-neural network back to the control end; finally, the hardware inference acceleration process of the sub-neural network of the optimal structure is achieved;
the structure of the sub-neural network comprises the number of convolution layers, the number of pooling layers, the number of full-connection layers, the structure of convolution layers, the structure of pooling layers and the connection mode of each layer; the convolution layer structure comprises the number, the size and the step length of convolution kernels, and the pooling layer structure comprises the size and the step length of pooling windows;
the parameters in the configuration file include:
the number of convolution layers, the number of pooling layers, the number of full-connection layers and the connection mode of each layer;
structure of the convolution layer: the number, size and step size of convolution kernels;
structure of the pooling layer: pooling window size and step size;
each weight parameter of the trained sub-neural network;
an address of the memory, a memory access mode;
the hardware acceleration end comprises a data input module, a configuration buffer module, a core calculation module and a data output module; the core calculation module comprises a convolution calculation unit, a pooling unit, a linear calculation unit and a classification unit; the number of the convolution calculation units, the pooling units and the linear calculation units respectively correspond to the number of convolution layers, the number of pooling layers and the number of full connection layers in the configuration file; the hardware structure of the convolution computing unit is determined by the convolution layer structure parameters in the configuration file, and the hardware structure of the pooling unit is determined by the pooling layer structure parameters in the configuration file; the hardware connection mode of each unit in the core computing module is determined by the connection mode of each layer in the configuration file;
the data input module is used for dividing data into n single-channel data blocks according to the number of input data channels, the configuration buffer module is used for buffering configuration files generated on a control end, the core calculation module uses m convolution kernels in a convolution calculation unit to sequentially convolve the n data blocks with a step length k to generate m feature graphs, m and k respectively represent the number of the convolution kernels and the convolution step length in the convolution calculation unit and respectively correspond to the number of the convolution kernels and the step length in the configuration files, the convolution kernels and the step length are transmitted to a pooling unit for feature sampling, classification results are generated through a linear calculation unit and a classification unit, meanwhile, the inference accuracy of a sub-neural network is calculated, and finally the classification results and the inference accuracy are output through a data output unit;
at the control end, the control neural network of the control end is a recurrent neural network, and the calculation flow of the recurrent neural network is as follows:
a1 Input x) t And h t-1 The calculation is divided into two paths, one path is opposite to x t And h t-1 Multiplication generates the current level memory cell c t Wherein x is t A sequence synthesized for deducing accuracy and accuracy precision requirements of the accuracy of the sub-neural network feedback, h t-1 The structural parameters of the sub-neural network output by the neural network are controlled for the previous time;
a2 Activating the result of the previous multiplication by using an activation function ReLU to perform ReLU (h) t-1 ×x t ) Operating;
a3 For x in another path t And h t-1 Adding and performing tanh (h t-1 +x t ) Operating;
a4 A previous step result and a previous stage memory unit c t-1 Adding;
a5 Activated again using the ReLU activation function, performing a ReLU (tanh (h) t-1 +x t )+c t-1 ) Operating;
a6 After multiplying the two paths of results, inputting a sigmoid function, and finally outputting the following results:
h t =sigmoid(ReLU(x t ×h t-1 )×ReLU(tanh(h t-1 +x t )+c t-1 ))
h t is a structural parameter representing a sub-neural network that controls the output of the neural network.
2. The neural network dynamic acceleration platform of claim 1,
at the hardware acceleration end;
the data input module is used for dividing data into n single-channel data blocks according to the number of input data channels;
the configuration caching module is used for caching the configuration file generated on the control end; and read by the core computing module;
a convolution calculating unit performs a layer of convolution layer calculation; the method comprises the steps that m convolution kernels in each layer of convolution layer sequentially convolve n data blocks with a step length k to generate m feature graphs, and the m feature graphs are performed for l times, wherein n is the number of input data channels, l, m and k respectively represent the number of convolution calculation units, and the number of convolution kernels and the convolution step length in the convolution calculation units respectively correspond to the number of convolution layer layers, the number of convolution kernels and the step length in a configuration file;
a pooling unit performs a layer of pooling layer calculation, receives the data from the convolution calculation unit and performs feature sampling, so as to perform o times, wherein o represents the number of pooling units and corresponds to the number of pooling layers in the configuration file;
a linear calculation unit executes a layer of full-connection layer calculation, receives data from the pooling unit, performs a=f (wi×c+b) operation, repeatedly performs T times in this way, outputs a classification result obtained by the delivery classification unit, and finally outputs the classification result to the data output module; wi is a weight parameter, T is the number of linear computing units, the number corresponds to the number of all connection layers in a configuration file, c is data output by a pooling unit, b is bias in a sub-neural network, a is output of all connection layers, and F represents an activation function.
3. The neural network dynamic acceleration platform of claim 2,
f is Sigomid or ReLU function; at the hardware acceleration end, a piecewise nonlinear approximation method is used for realizing the Sigmoid function approximately, and when x is in different intervals, different third-order polynomials y=ax are respectively used 3 +Bx 2 Carrying out fitting treatment on the Sigmoid function by +Cx+D, wherein A, B, C and D are coefficients of fitting the Sigmoid function through a third-order polynomial; the fitted third-order polynomial is used for realizing the Sigmoid function flow in the hardware acceleration end, and the flow is as follows:
(1) Different coefficients for x at different intervals; coefficients A, B, C, D under each segment interval are stored in an on-chip memory of a hardware acceleration end in advance, and the corresponding coefficients A, B, C, D are fetched through input x;
(2) Selecting the output by the selector, when the input x is a non-negative number, the output is Ax 3 +Bx 2 +cx+d; if the input x is a negative number, the output is 1- (Ax) 3 +Bx 2 +Cx+D)。
CN201910175975.0A 2019-03-08 2019-03-08 Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform Active CN109934336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910175975.0A CN109934336B (en) 2019-03-08 2019-03-08 Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910175975.0A CN109934336B (en) 2019-03-08 2019-03-08 Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform

Publications (2)

Publication Number Publication Date
CN109934336A CN109934336A (en) 2019-06-25
CN109934336B true CN109934336B (en) 2023-05-16

Family

ID=66986513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910175975.0A Active CN109934336B (en) 2019-03-08 2019-03-08 Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform

Country Status (1)

Country Link
CN (1) CN109934336B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490303A (en) * 2019-08-19 2019-11-22 北京小米智能科技有限公司 Super-network construction method, application method, device and medium
CN110673786B (en) * 2019-09-03 2020-11-10 浪潮电子信息产业股份有限公司 Data caching method and device
CN112561028A (en) * 2019-09-25 2021-03-26 华为技术有限公司 Method for training neural network model, and method and device for data processing
WO2021081809A1 (en) * 2019-10-30 2021-05-06 深圳市大疆创新科技有限公司 Network architecture search method and apparatus, and storage medium and computer program product
CN111027689B (en) * 2019-11-20 2024-03-22 中国航空工业集团公司西安航空计算技术研究所 Configuration method, device and computing system
CN111340221B (en) * 2020-02-25 2023-09-12 北京百度网讯科技有限公司 Neural network structure sampling method and device
CN111931926A (en) * 2020-10-12 2020-11-13 南京风兴科技有限公司 Hardware acceleration system and control method for convolutional neural network CNN
CN112613605A (en) * 2020-12-07 2021-04-06 深兰人工智能(深圳)有限公司 Neural network acceleration control method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247992A (en) * 2014-12-30 2017-10-13 合肥工业大学 A kind of sigmoid Function Fitting hardware circuits based on row maze approximate algorithm
CN107451653A (en) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 Computational methods, device and the readable storage medium storing program for executing of deep neural network
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108710941A (en) * 2018-04-11 2018-10-26 杭州菲数科技有限公司 The hard acceleration method and device of neural network model for electronic equipment
CN108764466A (en) * 2018-03-07 2018-11-06 东南大学 Convolutional neural networks hardware based on field programmable gate array and its accelerated method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247992A (en) * 2014-12-30 2017-10-13 合肥工业大学 A kind of sigmoid Function Fitting hardware circuits based on row maze approximate algorithm
CN107451653A (en) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 Computational methods, device and the readable storage medium storing program for executing of deep neural network
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108764466A (en) * 2018-03-07 2018-11-06 东南大学 Convolutional neural networks hardware based on field programmable gate array and its accelerated method
CN108710941A (en) * 2018-04-11 2018-10-26 杭州菲数科技有限公司 The hard acceleration method and device of neural network model for electronic equipment

Also Published As

Publication number Publication date
CN109934336A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN109934336B (en) Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform
US20200097806A1 (en) Processing method and accelerating device
US20200193297A1 (en) System and method for binary recurrent neural network inferencing
KR102434729B1 (en) Processing method and apparatus
WO2021208612A1 (en) Data processing method and device
CN108229671B (en) System and method for reducing storage bandwidth requirement of external data of accelerator
CN109523014B (en) News comment automatic generation method and system based on generative confrontation network model
CN111105023B (en) Data stream reconstruction method and reconfigurable data stream processor
US11314842B1 (en) Hardware implementation of mathematical functions
US11593628B2 (en) Dynamic variable bit width neural processor
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
CN110580519B (en) Convolution operation device and method thereof
JP2022534890A (en) Image processing method and apparatus, electronic equipment and storage medium
CN112633477A (en) Quantitative neural network acceleration method based on field programmable array
WO2020236255A1 (en) System and method for incremental learning using a grow-and-prune paradigm with neural networks
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN115017178A (en) Training method and device for data-to-text generation model
WO2022222649A1 (en) Neural network model training method and apparatus, device, and storage medium
WO2022028232A1 (en) Device and method for executing lstm neural network operation
CN109697507B (en) Processing method and device
US11507841B2 (en) Hardware accelerator for natural language processing applications
CN112183744A (en) Neural network pruning method and device
CN110009091B (en) Optimization of learning network in equivalence class space
CN115222028A (en) One-dimensional CNN-LSTM acceleration platform based on FPGA and implementation method
CN110334359B (en) Text translation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant