CN109934336A - Neural network dynamic based on optimum structure search accelerates platform designing method and neural network dynamic to accelerate platform - Google Patents
Neural network dynamic based on optimum structure search accelerates platform designing method and neural network dynamic to accelerate platform Download PDFInfo
- Publication number
- CN109934336A CN109934336A CN201910175975.0A CN201910175975A CN109934336A CN 109934336 A CN109934336 A CN 109934336A CN 201910175975 A CN201910175975 A CN 201910175975A CN 109934336 A CN109934336 A CN 109934336A
- Authority
- CN
- China
- Prior art keywords
- neural network
- sub
- layer
- hardware
- configuration file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Feedback Control In General (AREA)
Abstract
The present invention provides a kind of neural network dynamic acceleration platform designing method based on optimum structure search, and it includes: control terminal and hardware-accelerated end that neural network dynamic, which accelerates platform,;Control terminal is used for Training Control neural network, the inference accuracy rate and pre-set accuracy rate required precision that control neural network is fed back according to sub-neural network, the structure of sub-neural network is updated and generates sub-neural network structural parameters, and retraining is carried out to updated sub-neural network and generates weight parameter, configuration file is generated simultaneously and is sent to hardware-accelerated end, and configuration file includes the structural parameters and weight parameter of sub-neural network;After return sub-neural network inference accuracy rate is stablized when hardware-accelerated end, that is, search the optimum structure of the sub-neural network on hardware-accelerated end;The sub-neural network is to need to carry out the neural network of inference acceleration at hardware-accelerated end;The present invention dynamically carries out optimum structure search to the sub-neural network for needing to carry out hardware inference acceleration.
Description
Technical field
The present invention relates to intelligent platform calculating field more particularly to a kind of neural networks based on optimum network structure search
Dynamic accelerates platform designing method.
Background technique
Deep neural network (DNN) already shows immense value, in deep neural network, convolutional neural networks
(CNN) there is apparent advantage compared to traditional image recognition scheme.With the raising that people require, the intensification of the network number of plies
Constantly increase the mainstream development technical route as CNN with database.At the same time, for the utilization of deep neural network
It is faced with Railway Project:
(1) one convolutional neural networks of training will take more time, and CNN algorithm mainly passes through a large amount of multiplying
Realize convolution, the propositions such as LeCun in 1998 apply to identify that the CNN model of hand-written script is only less than 2.3 × 107It is secondary to multiply
The multiplying number of method operation, the CNN model for the entitled AlexNet that Krizhevsky in 2012 etc. is designed has reached 1.1
×109It is secondary, and the multiplying number needed for the CNN model that Simonyan in 2015 and Zisserman are proposed is even more than
1.6 × 1010It is secondary.
(2) sizable power consumption can be consumed when large-scale deep neural network operation, and runs effect in general processor
Rate is low.Because its model must be stored in external DRAM, and need to adjust in real time during the supposition to picture or voice
With.Following table shows the power consumption that basic operation and storing process are done in the CMOS processor of 45nm.If without the excellent of network structure
Change and the optimization of hardware structure, the access and operation of model data will occupy a large amount of power consumptions.Especially for Embedded movement
End, this does not allow.
1. 45nm CMOS power consumption of processing unit table of table
Meet hardware computational efficiency height, property for conditions such as hardware resource, required precisions for the above demand and challenge
Energy power dissipation ratio height etc. requires, and provides the design method for accelerating platform based on a kind of neural network dynamic based on optimum structure search
It is very urgent.
Summary of the invention
The purpose of the present invention is being to overcome the deficiencies in the prior art, provide a kind of based on optimum structure search
Neural network dynamic accelerates platform designing method and neural network dynamic to accelerate platform, dynamically pushes away to needing to carry out hardware
Optimum structure search is carried out by the sub-neural network of acceleration, and completes the inference accelerator of sub-neural network.The present invention uses
Technical solution be:
A kind of neural network dynamic acceleration platform designing method based on optimum structure search, comprising:
S1, neural network dynamic accelerate the control neural network in the control terminal of platform to be fed back according to sub-neural network
Inference accuracy rate and pre-set accuracy rate required precision are updated the structure of sub-neural network and generate sub- nerve
Network architecture parameters, while control terminal carries out retraining to the sub-neural network of updated structure, generates the power of sub-neural network
Weight parameter, ultimately produces address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter, storage
The address of device and access module form configuration file;
Configuration file write-in neural network dynamic is accelerated the allocating cache module at hardware-accelerated end in platform, and root by S2
According to the convolutional layer number of plies, the pondization in configuration file, number, the full articulamentum number of plies update convolution meter in core calculation module respectively layer by layer
Unit, pond unit, linear computing element quantity are calculated, is updated according to the parameter of each layer connection type of representative in configuration file each
The connection type of unit updates the knot of convolutional calculation unit, pond unit according to the structural parameters of convolutional layer and pond layer respectively
Structure;
S3, the data of input are read by data input cell, and are rolled up input data by convolutional calculation unit
Product operation, obtains classification results by linear operation unit, pond unit and taxon later, while calculating sub-neural network
Inference accuracy rate, finally by data outputting unit output category result and inference accuracy rate;
The inference accuracy rate of sub-neural network is fed back to control terminal by S4 again, and control neural network passes through sub- nerve net
The inference accuracy rate of network and pre-set accuracy rate required precision again update sub- neural network row, and generate new son
Parameters of Neural Network Structure, while control terminal carries out retraining to the sub-neural network of updated structure, generates sub-neural network
Weight parameter, ultimately produce address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter,
The address of memory and access module form configuration file;
S5, iteration controls mind after the sub-neural network accuracy rate that hardware-accelerated end returns keeps stablizing repeatedly
The topology update of sub-neural network is no longer carried out through network, the structural parameters of sub-neural network remain unchanged;When control nerve net
Network no longer updates, that is, searches the optimum structure of the sub-neural network on hardware-accelerated end.
A kind of neural network dynamic acceleration platform, comprising: control terminal and hardware-accelerated end;
Control terminal is used for Training Control neural network, and control neural network is accurate according to the inference that sub-neural network is fed back
Rate and pre-set accuracy rate required precision, are updated the structure of sub-neural network and generate sub-neural network structure
Parameter, and retraining is carried out to updated sub-neural network and generates weight parameter, while generating configuration file and being sent to hardware
Accelerate end, configuration file includes the structural parameters and weight parameter of sub-neural network;When hardware-accelerated end returns to sub-neural network
After inference accuracy rate is stablized, control neural network no longer carries out the update of sub-neural network, that is, searches on hardware-accelerated end
Sub-neural network optimum structure;
The sub-neural network is to need to carry out the neural network of inference acceleration at hardware-accelerated end;
The hardware-accelerated end is the sub-neural network hardware inference accelerator realized using ASIC, and it is raw to receive control terminal
At configuration file, complete to the hardware realization of sub-neural network, and accelerate the inference process of sub-neural network, while bundle mind
Inference accuracy rate through network feeds back to control terminal;It is finally reached and sub-neural network optimum structure is searched for and completes optimum structure
Sub-neural network hardware inference accelerator.
Specifically,
The structure of sub-neural network include the convolutional layer number of plies, pondization layer by layer number, the full articulamentum numbers of plies, the structure of convolutional layer,
The connection type of the structure of pond layer and each layer;Convolutional layer structure includes convolution kernel number, size, step-length, pond layer structure
Including pond window size, step-length;
Parameter in configuration file includes:
The convolutional layer number of plies, pondization number, the full articulamentum numbers of plies, the connection type of each layer layer by layer;
The structure of convolutional layer: convolution kernel number, size, step-length;
The structure of pond layer: pond window size, step-length;
Each weight parameter of sub-neural network after retraining;
The address of memory, memory access patterns;
Hardware-accelerated end includes data input module, allocating cache module, core calculation module, data outputting module;Its
Middle core calculation module includes convolutional calculation unit, pond unit, linear computing element, taxon;Wherein convolutional calculation list
Member, the quantity of pond unit, linear computing element are counted layer by layer with the convolutional layer number of plies, the pondization in configuration file respectively, are connect entirely
It counts layer by layer corresponding;The hardware configuration of convolutional calculation unit determines by the convolutional layer structural parameters in configuration file, pond unit
Hardware configuration determined by the pond layer structural parameters in configuration file;The hardware connection mode of each unit in core calculation module
It is determined by the connection type of each layer in configuration file;
The data input module is used to split data into n single-channel data block according to input data port number, described
Configuration file of the allocating cache module for generating on buffer control end, the core calculation module use in convolutional calculation unit
M convolution kernel successively to n data block with step-length k carry out convolution, generate m characteristic pattern, m, k respectively represent convolutional calculation list
Convolution kernel number, convolution step-length in member, it is corresponding with convolution kernel number, the step-length in configuration file respectively, and it is transferred to pond
Unit is sampled through row feature, generates classification results by linear computing element and taxon later, while calculating sub- nerve net
The inference accuracy rate of network, finally by data outputting unit output category result and inference accuracy rate.
The present invention has the advantages that the neural network dynamic provided by the invention based on optimum structure search accelerates platform to set
Meter method, can be in specific hardware resource, variable precision, dynamically on hardware-accelerated end in the case where external parameter change
Sub-neural network optimum structure scans for, and accelerates to the sub-neural network inference of this optimum structure;It can satisfy nerve net
Network handles the requirement of chip high-performance power dissipation ratio, low delay, variable precision, and solving processor in the prior art can not be high simultaneously
Effect applies to multi-functional, multi-platform problem.
Detailed description of the invention
Fig. 1 is the stratal diagram that the neural network dynamic of the invention based on optimum structure search accelerates platform.
Fig. 2 is that neural network dynamic of the invention accelerates platform structure composition schematic diagram.
Fig. 3 is the schematic diagram of calculation flow of control neural network of the invention.
Fig. 4 is that convolutional calculation unit of the invention calculates schematic diagram.
Fig. 5 is linear computing element of the invention and taxon schematic diagram.
Fig. 6 is that None-linear approximation method of the invention carrys out approximate realization Sigmoid function schematic diagram.
Specific embodiment
Below with reference to specific drawings and examples, the invention will be further described.
Fig. 1 shows that the neural network dynamic proposed by the present invention based on optimum structure search accelerates platform (hereinafter referred to as
Accelerate platform) stratal diagram;Include control layer, application layer, articulamentum, hardware layer;
Control layer and application layer belong to software level, and control layer completes the training of control neural network and searches for sub- nerve
The optimum structure of network is completed at the same time the retraining of the sub-neural network of optimum structure;
For application layer, user is defeated to the data at hardware-accelerated end to realize by calling the hardware programming interface supported
Enter;
Articulamentum provides the configuration file that transmission is made of sub-neural network structural parameters, weight parameter etc. and hardware adds
The sub-neural network inference accuracy rate that fast end is fed back;
Hardware layer mainly provides the inference acceleration function of sub-neural network, including data input module, allocating cache module,
The modules such as core calculation module, data outputting module;
Fig. 2 shows that the neural network dynamic proposed by the present invention based on optimum structure search accelerates the structural representation of platform
Figure, which includes control terminal and hardware-accelerated end;
Control terminal is the server for including graphics processor, is used for Training Control neural network, control neural network root
The inference accuracy rate and pre-set accuracy rate required precision fed back according to sub-neural network, to the structure of sub-neural network
Sub-neural network structural parameters are updated and are generated, and retraining is carried out to updated sub-neural network and generates weight ginseng
Number, while generating configuration file and being sent to hardware-accelerated end, configuration file includes that the structural parameters of sub-neural network and weight are joined
Number;After when hardware-accelerated end, return sub-neural network inference accuracy rate is stablized, control neural network no longer carries out sub- nerve net
The update of network searches the optimum structure of the sub-neural network on hardware-accelerated end;
The sub-neural network is to need to carry out the neural network of inference acceleration at hardware-accelerated end;The knot of sub-neural network
Structure includes the convolutional layer number of plies, pondization number, the full articulamentum number of plies, the structure of convolutional layer, the structure of pond layer and each layer layer by layer
Connection type;Convolutional layer structure includes convolution kernel number, size, step-length, and pond layer structure includes pond window size, step-length;
Parameter in configuration file includes:
The convolutional layer number of plies, pondization number, the full articulamentum numbers of plies, the connection type of each layer layer by layer;
The structure of convolutional layer: convolution kernel number, size, step-length;
The structure of pond layer: pond window size, step-length;
Each weight parameter of sub-neural network after retraining;
The address of memory, memory access patterns;
The hardware-accelerated end is the sub-neural network hardware inference accelerator realized using ASIC, and it is raw to receive control terminal
At configuration file, complete to the hardware realization of sub-neural network, and accelerate the inference process of sub-neural network, while bundle mind
Inference accuracy rate through network feeds back to control terminal;It is finally reached and sub-neural network optimum structure is searched for and completes optimum structure
Sub-neural network hardware inference accelerator;
Hardware-accelerated end includes data input module, allocating cache module, core calculation module, data outputting module;Its
Middle core calculation module includes convolutional calculation unit, pond unit, linear computing element, taxon;Wherein convolutional calculation list
Member, the quantity of pond unit, linear computing element are counted layer by layer with the convolutional layer number of plies, the pondization in configuration file respectively, are connect entirely
It counts layer by layer corresponding;The hardware configuration of convolutional calculation unit determines by the convolutional layer structural parameters in configuration file, pond unit
Hardware configuration determined by the pond layer structural parameters in configuration file;The hardware connection mode of each unit in core calculation module
It is determined by the connection type of each layer in configuration file;
The data input module is used to split data into n single-channel data block according to input data port number, described
Configuration file of the allocating cache module for generating on buffer control end, the core calculation module use in convolutional calculation unit
M convolution kernel successively to n data block with step-length k carry out convolution, generate m characteristic pattern, m, k respectively represent convolutional calculation list
Convolution kernel number, convolution step-length (corresponding with convolution kernel number, the step-length in configuration file respectively) in member, and it is transferred to pond
Unit is sampled through row feature, generates classification results by linear computing element and taxon later, while calculating sub- nerve net
The inference accuracy rate of network, finally by data outputting unit output category result and inference accuracy rate;
Neural network dynamic proposed by the present invention based on optimum structure search accelerates platform designing method, and main flow is such as
Under:
S1, neural network dynamic accelerate the control neural network in the control terminal of platform to be fed back according to sub-neural network
Inference accuracy rate and pre-set accuracy rate required precision are updated the structure of sub-neural network and generate sub- nerve
Network architecture parameters, while control terminal carries out retraining to the sub-neural network of updated structure, generates the power of sub-neural network
Weight parameter, ultimately produces address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter, storage
The address of device and access module form configuration file;
Configuration file write-in neural network dynamic is accelerated the allocating cache module at hardware-accelerated end in platform, and root by S2
According to the convolutional layer number of plies, the pondization in configuration file, number, the full articulamentum number of plies update convolution meter in core calculation module respectively layer by layer
Unit, pond unit, linear computing element quantity are calculated, is updated according to the parameter of each layer connection type of representative in configuration file each
The connection type of unit updates the knot of convolutional calculation unit, pond unit according to the structural parameters of convolutional layer and pond layer respectively
Structure;
S3, the data of input are read by data input cell, and are rolled up input data by convolutional calculation unit
Product operation, obtains classification results by linear operation unit, pond unit and taxon later, while calculating sub-neural network
Inference accuracy rate, finally by data outputting unit output category result and inference accuracy rate;
The inference accuracy rate of sub-neural network is fed back to control terminal by S4 again, and control neural network passes through sub- nerve net
The inference accuracy rate of network and pre-set accuracy rate required precision again update sub- neural network row, and generate new son
Parameters of Neural Network Structure, while control terminal carries out retraining to the sub-neural network of updated structure, generates sub-neural network
Weight parameter, ultimately produce address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter,
The address of memory and access module form configuration file;
S5, iteration controls mind after the sub-neural network accuracy rate that hardware-accelerated end returns keeps stablizing repeatedly
The topology update of sub-neural network is no longer carried out through network, the structural parameters of sub-neural network remain unchanged;When control nerve net
Network no longer updates, that is, searches the optimum structure of the sub-neural network on hardware-accelerated end.
Dynamically optimum structure search can be carried out by the sub-neural network to hardware-accelerated end by such method, and is completed
The inference process of the sub-neural network of optimum structure accelerates.
Specifically,
In control terminal, the control neural network of control terminal is recurrent neural network, produces configuration file;
Recurrent neural network is a tree, as shown in figure 3, specific calculation process is as follows:
A1 x) is inputtedtAnd ht-1Two-way is divided to calculate, all the way to xtAnd ht-1It is multiplied and generates the same level memory unit ct, wherein xtFor
Sequence synthesized by the inference accuracy rate and accuracy rate required precision of sub-neural network feedback, ht-1For previous secondary control nerve net
The sub-neural network structural parameters of network output;
A2 previous step multiplied result is activated using activation primitive ReLU), carries out ReLU (ht-1×xt) operation;
A3) another way is first to xtAnd ht-1Phase adduction carries out tanh (h with activation primitive tanht-1+xt) operation;
A4) previous step result and prime memory unit ct-1It is added;
A5 the activation of ReLU activation primitive) is reused, ReLU (tanh (h is carried outt-1+xt)+ct-1) operation;
A6 sigmoid function, the result finally exported are inputted after) two-way result is multiplied are as follows:
ht=sigmoid (ReLU (xt×ht-1)×ReLU(tanh(ht-1+xt)+ct-1))
htFor the structural parameters for representing the sub-neural network that control neural network exports;
When hardware-accelerated end return sub-neural network accuracy rate holding stablize it is constant after, control neural network no longer into
The structural parameters of the topology update of row sub-neural network, sub-neural network remain unchanged;
At hardware-accelerated end;
Data input module is used to split data into n single-channel data block according to input data port number, be inputted
Data are image data or the matrix that signal data is converted into;
Configuration file of the allocating cache module for being generated on buffer control end;And it can be read by core calculation module;
One convolutional calculation unit executes one layer of convolutional layer and calculates;M convolution kernel in every layer of convolutional layer is successively to n number
Convolution is carried out with step-length k according to block, generates m characteristic pattern, is so carried out l times, wherein n is input data port number, l, m, k difference
Convolutional calculation unit number is represented, convolution kernel number, convolution step-length in convolutional calculation unit, with the convolutional layer in configuration file
Number, convolution nucleus number, convolution step-length are corresponding;As shown in Figure 4;
One pond unit executes one layer of pond layer and calculates, and receives the data from convolutional calculation unit and carries out feature pumping
Sample so carries out o times, and wherein o represents the quantity of pond unit, corresponding with the pond number of plies in configuration file;
One linear computing element executes one layer of full articulamentum and calculates, and receives the data from pond unit, carries out a=F
(Wi × c+b) operation carries out T time repeatedly, exports and taxon is transferred to obtain classification results, final classification result export to
Data outputting module;Wherein Wi is weight parameter, and T is the number of linear computing element, with connecting layer by layer in configuration file entirely
Number is corresponding.C is the data of pond unit output, and b is the biasing in sub-neural network, and a is the output of full articulamentum, F (*) generation
Table activation primitive, generally Sigomid or ReLU function;As shown in Figure 5;
The present invention carrys out approximate realization Sigmoid function using piecewise nonlinear approximatioss, when x is in different sections, difference
Use three different rank multinomial y=Ax3+Bx2+ Cx+D is fitted processing to Sigmoid function, wherein A, B, C, and D is to pass through
The coefficient of three rank multinomials fitting Sigmoid function;It realizes structure as shown in fig. 6, using three rank multinomials of fitting hard
Part accelerates to realize Sigmoid function process in end are as follows:
(1) there are different coefficients for the x in different sections;Coefficient A, B, C, D under each piecewise interval are stored in advance in
In the on-chip memory at hardware-accelerated end, corresponding coefficient A, B, C, D are taken out by input x;
(2) selection output is carried out by selector, when the x of input is nonnegative number, exports Ax3+Bx2The result of+Cx+D;
If the x of input is negative, 1- (Ax is exported3+Bx2+ Cx+D) result;
In the course of work that hardware-accelerated end carries out the acceleration of sub-neural network inference, data input module and configuration are slow
Storing module needs to store by the memory on hardware-accelerated dististyle and outside piece through row data, and then hardware-accelerated end needs to obtain
On the hardware-accelerated dististyle and address of chip external memory;In the present invention, the address of memory is generated by control terminal, by memory
The access module of memory that determines of address then pass through in configuration file configuration information and determine, memory access patterns include master
Access module, data access patterns and weight access module etc.;
Wherein, main access module is for the data exchange between on-chip memory and chip external memory, data access patterns
For reading data to data input module from on-chip memory and storing the final classification result in core calculation module
To memory, weight access module is used to read weight parameter data from on-chip memory;
It should be noted last that the above specific embodiment is only used to illustrate the technical scheme of the present invention and not to limit it,
Although being described the invention in detail referring to example, those skilled in the art should understand that, it can be to the present invention
Technical solution be modified or replaced equivalently, without departing from the spirit and scope of the technical solution of the present invention, should all cover
In the scope of the claims of the present invention.
Claims (6)
1. a kind of neural network dynamic accelerates platform characterized by comprising control terminal and hardware-accelerated end;
Control terminal be used for Training Control neural network, control neural network according to the inference accuracy rate that sub-neural network is fed back with
And pre-set accuracy rate required precision, the structure of sub-neural network is updated and generates sub-neural network structure ginseng
Number, and retraining is carried out to updated sub-neural network and generates weight parameter, while generating configuration file and is sent to hardware and adding
Fast end, configuration file include the structural parameters and weight parameter of sub-neural network;It is pushed away when hardware-accelerated end returns to sub-neural network
After stablizing by accuracy rate, control neural network no longer carries out the update of sub-neural network, that is, searches on hardware-accelerated end
The optimum structure of sub-neural network;
The sub-neural network is to need to carry out the neural network of inference acceleration at hardware-accelerated end;
The hardware-accelerated end is the sub-neural network hardware inference accelerator realized using ASIC, receives control terminal generation
Configuration file completes the hardware realization to sub-neural network, and accelerates the inference process of sub-neural network, while bundle nerve net
The inference accuracy rate of network feeds back to control terminal;It is finally reached the son that optimum structure is searched for and completed to sub-neural network optimum structure
The hardware inference accelerator of neural network.
2. neural network dynamic as described in claim 1 accelerates platform, which is characterized in that
The structure of sub-neural network includes the convolutional layer number of plies, pondization number, the full articulamentum number of plies, the structure of convolutional layer, Chi Hua layer by layer
The structure of layer and the connection type of each layer;Convolutional layer structure includes convolution kernel number, size, step-length, and pond layer structure includes
Pond window size, step-length;
Parameter in configuration file includes:
The convolutional layer number of plies, pondization number, the full articulamentum numbers of plies, the connection type of each layer layer by layer;
The structure of convolutional layer: convolution kernel number, size, step-length;
The structure of pond layer: pond window size, step-length;
Each weight parameter of sub-neural network after retraining;
The address of memory, memory access patterns;
Hardware-accelerated end includes data input module, allocating cache module, core calculation module, data outputting module;Its center
Heart computing module includes convolutional calculation unit, pond unit, linear computing element, taxon;Wherein convolutional calculation unit, pond
Change unit, the quantity of linear computing element is counted with the convolutional layer number of plies, the pondization in configuration file layer by layer respectively, the full articulamentum number of plies
It is corresponding;The hardware configuration of convolutional calculation unit determines by the convolutional layer structural parameters in configuration file, the hardware of pond unit
Structure is determined by the pond layer structural parameters in configuration file;The hardware connection mode of each unit is by configuring in core calculation module
The connection type of each layer in file is determined;
The data input module is used to split data into n single-channel data block, the configuration according to input data port number
Configuration file of the cache module for generating on buffer control end, the core calculation module use the m in convolutional calculation unit
A convolution kernel successively carries out convolution to n data block with step-length k, generates m characteristic pattern, m, k respectively represent convolutional calculation unit
Middle convolution kernel number, convolution step-length, it is corresponding with convolution kernel number, the step-length in configuration file respectively, and it is transferred to Chi Huadan
Member is sampled through row feature, generates classification results by linear computing element and taxon later, while calculating sub-neural network
Inference accuracy rate, finally by data outputting unit output category result and inference accuracy rate.
3. neural network dynamic as described in claim 1 accelerates platform, which is characterized in that
It is recurrent neural network in the control neural network of control terminal, control terminal, the calculation process of recurrent neural network is as follows:
A1 x) is inputtedtAnd ht-1Two-way is divided to calculate, all the way to xtAnd ht-1It is multiplied and generates the same level memory unit ct, wherein xtFor sub- mind
Sequence synthesized by inference accuracy rate and accuracy rate required precision through network-feedback, ht-1It is defeated for previous secondary control neural network
Sub-neural network structural parameters out;
A2 previous step multiplied result is activated using activation primitive ReLU), carries out ReLU(ht-1×xt) operation;
A3) another way is first to xtAnd ht-1Phase adduction carries out tanh(h with activation primitive tanht-1+xt) operation;
A4) previous step result and prime memory unit ct-1It is added;
A5 the activation of ReLU activation primitive) is reused, ReLU(tanh(h is carried outt-1+xt)+ct-1) operation;
A6 sigmoid function, the result finally exported are inputted after) two-way result is multiplied are as follows:
ht=sigmoid(ReLU(xt ×ht-1) × ReLU(tanh(ht-1+xt)+ct-1))
htFor the structural parameters for representing the sub-neural network that control neural network exports.
4. neural network dynamic as claimed in claim 2 accelerates platform, which is characterized in that
At hardware-accelerated end;
Data input module is used to split data into n single-channel data block according to input data port number;
Configuration file of the allocating cache module for being generated on buffer control end;And it is read by core calculation module;
One convolutional calculation unit executes one layer of convolutional layer and calculates;M convolution kernel in every layer of convolutional layer is successively to n data block
Convolution is carried out with step-length k, generates m characteristic pattern, is so carried out l times, wherein n is input data port number, and l, m, k are respectively represented
Convolutional calculation unit number, convolution kernel number, convolution step-length in convolutional calculation unit, respectively layer by layer with the convolution in configuration file
Number, convolution kernel number, step-length are corresponding;
One pond unit executes one layer of pond layer and calculates, and receives the data from convolutional calculation unit and carries out feature sampling,
It so carries out o times, wherein o represents the quantity of pond unit, corresponding with the pond layer number in configuration file;
One linear computing element executes one layer of full articulamentum and calculates, data of the receiving from pond unit, and progress a=F(Wi ×
C+b) operation carries out T times repeatedly, and output transfers to taxon to obtain classification results, and final classification result is exported to data
Output module;Wherein Wi is weight parameter, and T is the number of linear computing element, with the full articulamentum number of plies phase in configuration file
Corresponding, c is the data of pond unit output, and b is the biasing in sub-neural network, and a is the output of full articulamentum, F(*) it represents and swashs
Function living.
5. neural network dynamic as claimed in claim 4 accelerates platform, which is characterized in that
It F(*) is Sigomid or ReLU function;At hardware-accelerated end, carry out approximate realization using piecewise nonlinear approximatioss
Sigmoid function uses three different rank multinomial y=Ax when x is in different sections respectively3+ Bx2+ Cx+D is to Sigmoid
Function is fitted processing, wherein A, B, C, and D is the coefficient that Sigmoid function is fitted by three rank multinomials;Use fitting
Three rank multinomials realize Sigmoid function process in hardware-accelerated end are as follows:
(1) there are different coefficients for the x in different sections;Coefficient A, B, C, D under each piecewise interval are stored in advance in hardware
In the on-chip memory for accelerating end, corresponding coefficient A, B, C, D are taken out by input x;
(2) selection output is carried out by selector, when the x of input is nonnegative number, exported as Ax3+Bx2+Cx+D;If input
X is negative, is exported as 1-(Ax3+Bx2+ Cx+D).
6. a kind of neural network dynamic based on optimum structure search accelerates platform designing method characterized by comprising
S1, the inference that neural network dynamic accelerates the control neural network in the control terminal of platform to be fed back according to sub-neural network
Accuracy rate and pre-set accuracy rate required precision, are updated the structure of sub-neural network and generate sub-neural network
Structural parameters, while control terminal carries out retraining to the sub-neural network of updated structure, generates the weight ginseng of sub-neural network
Number, ultimately produces address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter, memory
Address and access module form configuration file;
Configuration file write-in neural network dynamic is accelerated the allocating cache module at hardware-accelerated end in platform by S2, and according to matching
Setting the convolutional layer number of plies in file, pondization, number, the full articulamentum number of plies update convolutional calculation list in core calculation module respectively layer by layer
Member, pond unit, linear computing element quantity update each unit according to the parameter of each layer connection type of representative in configuration file
Connection type, update the structure of convolutional calculation unit, pond unit respectively according to the structural parameters of convolutional layer and pond layer;
S3 is read the data of input by data input cell, and input data is carried out convolution fortune by convolutional calculation unit
It calculates, classification results is obtained by linear operation unit, pond unit and taxon later, while calculating pushing away for sub-neural network
By accuracy rate, finally by data outputting unit output category result and inference accuracy rate;
The inference accuracy rate of sub-neural network is fed back to control terminal by S4 again, and control neural network passes through sub-neural network
Inference accuracy rate and pre-set accuracy rate required precision again update sub- neural network row, and generate new son nerve
Network architecture parameters, while control terminal carries out retraining to the sub-neural network of updated structure, generates the power of sub-neural network
Weight parameter, ultimately produces address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter, storage
The address of device and access module form configuration file;
S5, iteration controls nerve net after the sub-neural network accuracy rate that hardware-accelerated end returns keeps stablizing repeatedly
Network no longer carries out the topology update of sub-neural network, and the structural parameters of sub-neural network remain unchanged;When control neural network not
It updates again, that is, searches the optimum structure of the sub-neural network on hardware-accelerated end.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910175975.0A CN109934336B (en) | 2019-03-08 | 2019-03-08 | Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910175975.0A CN109934336B (en) | 2019-03-08 | 2019-03-08 | Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109934336A true CN109934336A (en) | 2019-06-25 |
CN109934336B CN109934336B (en) | 2023-05-16 |
Family
ID=66986513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910175975.0A Active CN109934336B (en) | 2019-03-08 | 2019-03-08 | Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109934336B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110673786A (en) * | 2019-09-03 | 2020-01-10 | 浪潮电子信息产业股份有限公司 | Data caching method and device |
CN111027689A (en) * | 2019-11-20 | 2020-04-17 | 中国航空工业集团公司西安航空计算技术研究所 | Configuration method, device and computing system |
RU2721181C1 (en) * | 2019-08-19 | 2020-05-18 | Бейджин Сяоми Интеллиджент Текнолоджи Ко., Лтд. | Super network construction method, method of use, device and data medium |
CN111340221A (en) * | 2020-02-25 | 2020-06-26 | 北京百度网讯科技有限公司 | Method and device for sampling neural network structure |
CN111782398A (en) * | 2020-06-29 | 2020-10-16 | 上海商汤智能科技有限公司 | Data processing method, device and system and related equipment |
CN111931926A (en) * | 2020-10-12 | 2020-11-13 | 南京风兴科技有限公司 | Hardware acceleration system and control method for convolutional neural network CNN |
CN112106077A (en) * | 2019-10-30 | 2020-12-18 | 深圳市大疆创新科技有限公司 | Method, apparatus, storage medium, and computer program product for network structure search |
CN112561028A (en) * | 2019-09-25 | 2021-03-26 | 华为技术有限公司 | Method for training neural network model, and method and device for data processing |
CN112613605A (en) * | 2020-12-07 | 2021-04-06 | 深兰人工智能(深圳)有限公司 | Neural network acceleration control method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107247992A (en) * | 2014-12-30 | 2017-10-13 | 合肥工业大学 | A kind of sigmoid Function Fitting hardware circuits based on row maze approximate algorithm |
CN107451653A (en) * | 2017-07-05 | 2017-12-08 | 深圳市自行科技有限公司 | Computational methods, device and the readable storage medium storing program for executing of deep neural network |
CN108280514A (en) * | 2018-01-05 | 2018-07-13 | 中国科学技术大学 | Sparse neural network acceleration system based on FPGA and design method |
CN108710941A (en) * | 2018-04-11 | 2018-10-26 | 杭州菲数科技有限公司 | The hard acceleration method and device of neural network model for electronic equipment |
CN108764466A (en) * | 2018-03-07 | 2018-11-06 | 东南大学 | Convolutional neural networks hardware based on field programmable gate array and its accelerated method |
-
2019
- 2019-03-08 CN CN201910175975.0A patent/CN109934336B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107247992A (en) * | 2014-12-30 | 2017-10-13 | 合肥工业大学 | A kind of sigmoid Function Fitting hardware circuits based on row maze approximate algorithm |
CN107451653A (en) * | 2017-07-05 | 2017-12-08 | 深圳市自行科技有限公司 | Computational methods, device and the readable storage medium storing program for executing of deep neural network |
CN108280514A (en) * | 2018-01-05 | 2018-07-13 | 中国科学技术大学 | Sparse neural network acceleration system based on FPGA and design method |
CN108764466A (en) * | 2018-03-07 | 2018-11-06 | 东南大学 | Convolutional neural networks hardware based on field programmable gate array and its accelerated method |
CN108710941A (en) * | 2018-04-11 | 2018-10-26 | 杭州菲数科技有限公司 | The hard acceleration method and device of neural network model for electronic equipment |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2721181C1 (en) * | 2019-08-19 | 2020-05-18 | Бейджин Сяоми Интеллиджент Текнолоджи Ко., Лтд. | Super network construction method, method of use, device and data medium |
US11803475B2 (en) | 2019-09-03 | 2023-10-31 | Inspur Electronic Information Industry Co., Ltd. | Method and apparatus for data caching |
CN110673786A (en) * | 2019-09-03 | 2020-01-10 | 浪潮电子信息产业股份有限公司 | Data caching method and device |
CN112561028A (en) * | 2019-09-25 | 2021-03-26 | 华为技术有限公司 | Method for training neural network model, and method and device for data processing |
WO2021081809A1 (en) * | 2019-10-30 | 2021-05-06 | 深圳市大疆创新科技有限公司 | Network architecture search method and apparatus, and storage medium and computer program product |
CN112106077A (en) * | 2019-10-30 | 2020-12-18 | 深圳市大疆创新科技有限公司 | Method, apparatus, storage medium, and computer program product for network structure search |
CN111027689A (en) * | 2019-11-20 | 2020-04-17 | 中国航空工业集团公司西安航空计算技术研究所 | Configuration method, device and computing system |
CN111027689B (en) * | 2019-11-20 | 2024-03-22 | 中国航空工业集团公司西安航空计算技术研究所 | Configuration method, device and computing system |
CN111340221A (en) * | 2020-02-25 | 2020-06-26 | 北京百度网讯科技有限公司 | Method and device for sampling neural network structure |
CN111340221B (en) * | 2020-02-25 | 2023-09-12 | 北京百度网讯科技有限公司 | Neural network structure sampling method and device |
CN111782398A (en) * | 2020-06-29 | 2020-10-16 | 上海商汤智能科技有限公司 | Data processing method, device and system and related equipment |
CN111931926A (en) * | 2020-10-12 | 2020-11-13 | 南京风兴科技有限公司 | Hardware acceleration system and control method for convolutional neural network CNN |
CN112613605A (en) * | 2020-12-07 | 2021-04-06 | 深兰人工智能(深圳)有限公司 | Neural network acceleration control method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109934336B (en) | 2023-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109934336A (en) | Neural network dynamic based on optimum structure search accelerates platform designing method and neural network dynamic to accelerate platform | |
CN109063825A (en) | Convolutional neural networks accelerator | |
CN110390383A (en) | A kind of deep neural network hardware accelerator based on power exponent quantization | |
CN110059811A (en) | Weight buffer | |
CN108898554A (en) | Improve the method and Related product of image resolution ratio | |
CN109032781A (en) | A kind of FPGA parallel system of convolutional neural networks algorithm | |
CN110458279A (en) | A kind of binary neural network accelerated method and system based on FPGA | |
WO2020156508A1 (en) | Method and device for operating on basis of chip with operation array, and chip | |
CN108764466A (en) | Convolutional neural networks hardware based on field programmable gate array and its accelerated method | |
CN110163353A (en) | A kind of computing device and method | |
CN110533183A (en) | The model partition and task laying method of heterogeneous network perception in a kind of assembly line distribution deep learning | |
CN106127302A (en) | Process the circuit of data, image processing system, the method and apparatus of process data | |
CN110383300A (en) | A kind of computing device and method | |
CN108647184A (en) | A kind of Dynamic High-accuracy bit convolution multiplication Fast implementation | |
CN108491924B (en) | Neural network data serial flow processing device for artificial intelligence calculation | |
CN110222835A (en) | A kind of convolutional neural networks hardware system and operation method based on zero value detection | |
CN109993293A (en) | A kind of deep learning accelerator suitable for stack hourglass network | |
CN111091183B (en) | Neural network acceleration system and method | |
CN115470889A (en) | Network-on-chip autonomous optimal mapping exploration system and method based on reinforcement learning | |
CN110377874A (en) | Convolution algorithm method and system | |
CN110414672A (en) | Convolution algorithm method, apparatus and system | |
CN114445607A (en) | Storage and calculation integrated low-power-consumption integrated image recognition system and method | |
CN109359542A (en) | The determination method and terminal device of vehicle damage rank neural network based | |
CN109389210A (en) | Processing method and processing unit | |
CN116797850A (en) | Class increment image classification method based on knowledge distillation and consistency regularization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |