CN109934336A - Neural network dynamic based on optimum structure search accelerates platform designing method and neural network dynamic to accelerate platform - Google Patents

Neural network dynamic based on optimum structure search accelerates platform designing method and neural network dynamic to accelerate platform Download PDF

Info

Publication number
CN109934336A
CN109934336A CN201910175975.0A CN201910175975A CN109934336A CN 109934336 A CN109934336 A CN 109934336A CN 201910175975 A CN201910175975 A CN 201910175975A CN 109934336 A CN109934336 A CN 109934336A
Authority
CN
China
Prior art keywords
neural network
sub
layer
hardware
configuration file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910175975.0A
Other languages
Chinese (zh)
Other versions
CN109934336B (en
Inventor
虞致国
马晓杰
顾晓峰
魏敬和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201910175975.0A priority Critical patent/CN109934336B/en
Publication of CN109934336A publication Critical patent/CN109934336A/en
Application granted granted Critical
Publication of CN109934336B publication Critical patent/CN109934336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Feedback Control In General (AREA)

Abstract

The present invention provides a kind of neural network dynamic acceleration platform designing method based on optimum structure search, and it includes: control terminal and hardware-accelerated end that neural network dynamic, which accelerates platform,;Control terminal is used for Training Control neural network, the inference accuracy rate and pre-set accuracy rate required precision that control neural network is fed back according to sub-neural network, the structure of sub-neural network is updated and generates sub-neural network structural parameters, and retraining is carried out to updated sub-neural network and generates weight parameter, configuration file is generated simultaneously and is sent to hardware-accelerated end, and configuration file includes the structural parameters and weight parameter of sub-neural network;After return sub-neural network inference accuracy rate is stablized when hardware-accelerated end, that is, search the optimum structure of the sub-neural network on hardware-accelerated end;The sub-neural network is to need to carry out the neural network of inference acceleration at hardware-accelerated end;The present invention dynamically carries out optimum structure search to the sub-neural network for needing to carry out hardware inference acceleration.

Description

Neural network dynamic based on optimum structure search accelerates platform designing method and nerve Network dynamic accelerates platform
Technical field
The present invention relates to intelligent platform calculating field more particularly to a kind of neural networks based on optimum network structure search Dynamic accelerates platform designing method.
Background technique
Deep neural network (DNN) already shows immense value, in deep neural network, convolutional neural networks (CNN) there is apparent advantage compared to traditional image recognition scheme.With the raising that people require, the intensification of the network number of plies Constantly increase the mainstream development technical route as CNN with database.At the same time, for the utilization of deep neural network It is faced with Railway Project:
(1) one convolutional neural networks of training will take more time, and CNN algorithm mainly passes through a large amount of multiplying Realize convolution, the propositions such as LeCun in 1998 apply to identify that the CNN model of hand-written script is only less than 2.3 × 107It is secondary to multiply The multiplying number of method operation, the CNN model for the entitled AlexNet that Krizhevsky in 2012 etc. is designed has reached 1.1 ×109It is secondary, and the multiplying number needed for the CNN model that Simonyan in 2015 and Zisserman are proposed is even more than 1.6 × 1010It is secondary.
(2) sizable power consumption can be consumed when large-scale deep neural network operation, and runs effect in general processor Rate is low.Because its model must be stored in external DRAM, and need to adjust in real time during the supposition to picture or voice With.Following table shows the power consumption that basic operation and storing process are done in the CMOS processor of 45nm.If without the excellent of network structure Change and the optimization of hardware structure, the access and operation of model data will occupy a large amount of power consumptions.Especially for Embedded movement End, this does not allow.
1. 45nm CMOS power consumption of processing unit table of table
Meet hardware computational efficiency height, property for conditions such as hardware resource, required precisions for the above demand and challenge Energy power dissipation ratio height etc. requires, and provides the design method for accelerating platform based on a kind of neural network dynamic based on optimum structure search It is very urgent.
Summary of the invention
The purpose of the present invention is being to overcome the deficiencies in the prior art, provide a kind of based on optimum structure search Neural network dynamic accelerates platform designing method and neural network dynamic to accelerate platform, dynamically pushes away to needing to carry out hardware Optimum structure search is carried out by the sub-neural network of acceleration, and completes the inference accelerator of sub-neural network.The present invention uses Technical solution be:
A kind of neural network dynamic acceleration platform designing method based on optimum structure search, comprising:
S1, neural network dynamic accelerate the control neural network in the control terminal of platform to be fed back according to sub-neural network Inference accuracy rate and pre-set accuracy rate required precision are updated the structure of sub-neural network and generate sub- nerve Network architecture parameters, while control terminal carries out retraining to the sub-neural network of updated structure, generates the power of sub-neural network Weight parameter, ultimately produces address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter, storage The address of device and access module form configuration file;
Configuration file write-in neural network dynamic is accelerated the allocating cache module at hardware-accelerated end in platform, and root by S2 According to the convolutional layer number of plies, the pondization in configuration file, number, the full articulamentum number of plies update convolution meter in core calculation module respectively layer by layer Unit, pond unit, linear computing element quantity are calculated, is updated according to the parameter of each layer connection type of representative in configuration file each The connection type of unit updates the knot of convolutional calculation unit, pond unit according to the structural parameters of convolutional layer and pond layer respectively Structure;
S3, the data of input are read by data input cell, and are rolled up input data by convolutional calculation unit Product operation, obtains classification results by linear operation unit, pond unit and taxon later, while calculating sub-neural network Inference accuracy rate, finally by data outputting unit output category result and inference accuracy rate;
The inference accuracy rate of sub-neural network is fed back to control terminal by S4 again, and control neural network passes through sub- nerve net The inference accuracy rate of network and pre-set accuracy rate required precision again update sub- neural network row, and generate new son Parameters of Neural Network Structure, while control terminal carries out retraining to the sub-neural network of updated structure, generates sub-neural network Weight parameter, ultimately produce address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter, The address of memory and access module form configuration file;
S5, iteration controls mind after the sub-neural network accuracy rate that hardware-accelerated end returns keeps stablizing repeatedly The topology update of sub-neural network is no longer carried out through network, the structural parameters of sub-neural network remain unchanged;When control nerve net Network no longer updates, that is, searches the optimum structure of the sub-neural network on hardware-accelerated end.
A kind of neural network dynamic acceleration platform, comprising: control terminal and hardware-accelerated end;
Control terminal is used for Training Control neural network, and control neural network is accurate according to the inference that sub-neural network is fed back Rate and pre-set accuracy rate required precision, are updated the structure of sub-neural network and generate sub-neural network structure Parameter, and retraining is carried out to updated sub-neural network and generates weight parameter, while generating configuration file and being sent to hardware Accelerate end, configuration file includes the structural parameters and weight parameter of sub-neural network;When hardware-accelerated end returns to sub-neural network After inference accuracy rate is stablized, control neural network no longer carries out the update of sub-neural network, that is, searches on hardware-accelerated end Sub-neural network optimum structure;
The sub-neural network is to need to carry out the neural network of inference acceleration at hardware-accelerated end;
The hardware-accelerated end is the sub-neural network hardware inference accelerator realized using ASIC, and it is raw to receive control terminal At configuration file, complete to the hardware realization of sub-neural network, and accelerate the inference process of sub-neural network, while bundle mind Inference accuracy rate through network feeds back to control terminal;It is finally reached and sub-neural network optimum structure is searched for and completes optimum structure Sub-neural network hardware inference accelerator.
Specifically,
The structure of sub-neural network include the convolutional layer number of plies, pondization layer by layer number, the full articulamentum numbers of plies, the structure of convolutional layer, The connection type of the structure of pond layer and each layer;Convolutional layer structure includes convolution kernel number, size, step-length, pond layer structure Including pond window size, step-length;
Parameter in configuration file includes:
The convolutional layer number of plies, pondization number, the full articulamentum numbers of plies, the connection type of each layer layer by layer;
The structure of convolutional layer: convolution kernel number, size, step-length;
The structure of pond layer: pond window size, step-length;
Each weight parameter of sub-neural network after retraining;
The address of memory, memory access patterns;
Hardware-accelerated end includes data input module, allocating cache module, core calculation module, data outputting module;Its Middle core calculation module includes convolutional calculation unit, pond unit, linear computing element, taxon;Wherein convolutional calculation list Member, the quantity of pond unit, linear computing element are counted layer by layer with the convolutional layer number of plies, the pondization in configuration file respectively, are connect entirely It counts layer by layer corresponding;The hardware configuration of convolutional calculation unit determines by the convolutional layer structural parameters in configuration file, pond unit Hardware configuration determined by the pond layer structural parameters in configuration file;The hardware connection mode of each unit in core calculation module It is determined by the connection type of each layer in configuration file;
The data input module is used to split data into n single-channel data block according to input data port number, described Configuration file of the allocating cache module for generating on buffer control end, the core calculation module use in convolutional calculation unit M convolution kernel successively to n data block with step-length k carry out convolution, generate m characteristic pattern, m, k respectively represent convolutional calculation list Convolution kernel number, convolution step-length in member, it is corresponding with convolution kernel number, the step-length in configuration file respectively, and it is transferred to pond Unit is sampled through row feature, generates classification results by linear computing element and taxon later, while calculating sub- nerve net The inference accuracy rate of network, finally by data outputting unit output category result and inference accuracy rate.
The present invention has the advantages that the neural network dynamic provided by the invention based on optimum structure search accelerates platform to set Meter method, can be in specific hardware resource, variable precision, dynamically on hardware-accelerated end in the case where external parameter change Sub-neural network optimum structure scans for, and accelerates to the sub-neural network inference of this optimum structure;It can satisfy nerve net Network handles the requirement of chip high-performance power dissipation ratio, low delay, variable precision, and solving processor in the prior art can not be high simultaneously Effect applies to multi-functional, multi-platform problem.
Detailed description of the invention
Fig. 1 is the stratal diagram that the neural network dynamic of the invention based on optimum structure search accelerates platform.
Fig. 2 is that neural network dynamic of the invention accelerates platform structure composition schematic diagram.
Fig. 3 is the schematic diagram of calculation flow of control neural network of the invention.
Fig. 4 is that convolutional calculation unit of the invention calculates schematic diagram.
Fig. 5 is linear computing element of the invention and taxon schematic diagram.
Fig. 6 is that None-linear approximation method of the invention carrys out approximate realization Sigmoid function schematic diagram.
Specific embodiment
Below with reference to specific drawings and examples, the invention will be further described.
Fig. 1 shows that the neural network dynamic proposed by the present invention based on optimum structure search accelerates platform (hereinafter referred to as Accelerate platform) stratal diagram;Include control layer, application layer, articulamentum, hardware layer;
Control layer and application layer belong to software level, and control layer completes the training of control neural network and searches for sub- nerve The optimum structure of network is completed at the same time the retraining of the sub-neural network of optimum structure;
For application layer, user is defeated to the data at hardware-accelerated end to realize by calling the hardware programming interface supported Enter;
Articulamentum provides the configuration file that transmission is made of sub-neural network structural parameters, weight parameter etc. and hardware adds The sub-neural network inference accuracy rate that fast end is fed back;
Hardware layer mainly provides the inference acceleration function of sub-neural network, including data input module, allocating cache module, The modules such as core calculation module, data outputting module;
Fig. 2 shows that the neural network dynamic proposed by the present invention based on optimum structure search accelerates the structural representation of platform Figure, which includes control terminal and hardware-accelerated end;
Control terminal is the server for including graphics processor, is used for Training Control neural network, control neural network root The inference accuracy rate and pre-set accuracy rate required precision fed back according to sub-neural network, to the structure of sub-neural network Sub-neural network structural parameters are updated and are generated, and retraining is carried out to updated sub-neural network and generates weight ginseng Number, while generating configuration file and being sent to hardware-accelerated end, configuration file includes that the structural parameters of sub-neural network and weight are joined Number;After when hardware-accelerated end, return sub-neural network inference accuracy rate is stablized, control neural network no longer carries out sub- nerve net The update of network searches the optimum structure of the sub-neural network on hardware-accelerated end;
The sub-neural network is to need to carry out the neural network of inference acceleration at hardware-accelerated end;The knot of sub-neural network Structure includes the convolutional layer number of plies, pondization number, the full articulamentum number of plies, the structure of convolutional layer, the structure of pond layer and each layer layer by layer Connection type;Convolutional layer structure includes convolution kernel number, size, step-length, and pond layer structure includes pond window size, step-length;
Parameter in configuration file includes:
The convolutional layer number of plies, pondization number, the full articulamentum numbers of plies, the connection type of each layer layer by layer;
The structure of convolutional layer: convolution kernel number, size, step-length;
The structure of pond layer: pond window size, step-length;
Each weight parameter of sub-neural network after retraining;
The address of memory, memory access patterns;
The hardware-accelerated end is the sub-neural network hardware inference accelerator realized using ASIC, and it is raw to receive control terminal At configuration file, complete to the hardware realization of sub-neural network, and accelerate the inference process of sub-neural network, while bundle mind Inference accuracy rate through network feeds back to control terminal;It is finally reached and sub-neural network optimum structure is searched for and completes optimum structure Sub-neural network hardware inference accelerator;
Hardware-accelerated end includes data input module, allocating cache module, core calculation module, data outputting module;Its Middle core calculation module includes convolutional calculation unit, pond unit, linear computing element, taxon;Wherein convolutional calculation list Member, the quantity of pond unit, linear computing element are counted layer by layer with the convolutional layer number of plies, the pondization in configuration file respectively, are connect entirely It counts layer by layer corresponding;The hardware configuration of convolutional calculation unit determines by the convolutional layer structural parameters in configuration file, pond unit Hardware configuration determined by the pond layer structural parameters in configuration file;The hardware connection mode of each unit in core calculation module It is determined by the connection type of each layer in configuration file;
The data input module is used to split data into n single-channel data block according to input data port number, described Configuration file of the allocating cache module for generating on buffer control end, the core calculation module use in convolutional calculation unit M convolution kernel successively to n data block with step-length k carry out convolution, generate m characteristic pattern, m, k respectively represent convolutional calculation list Convolution kernel number, convolution step-length (corresponding with convolution kernel number, the step-length in configuration file respectively) in member, and it is transferred to pond Unit is sampled through row feature, generates classification results by linear computing element and taxon later, while calculating sub- nerve net The inference accuracy rate of network, finally by data outputting unit output category result and inference accuracy rate;
Neural network dynamic proposed by the present invention based on optimum structure search accelerates platform designing method, and main flow is such as Under:
S1, neural network dynamic accelerate the control neural network in the control terminal of platform to be fed back according to sub-neural network Inference accuracy rate and pre-set accuracy rate required precision are updated the structure of sub-neural network and generate sub- nerve Network architecture parameters, while control terminal carries out retraining to the sub-neural network of updated structure, generates the power of sub-neural network Weight parameter, ultimately produces address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter, storage The address of device and access module form configuration file;
Configuration file write-in neural network dynamic is accelerated the allocating cache module at hardware-accelerated end in platform, and root by S2 According to the convolutional layer number of plies, the pondization in configuration file, number, the full articulamentum number of plies update convolution meter in core calculation module respectively layer by layer Unit, pond unit, linear computing element quantity are calculated, is updated according to the parameter of each layer connection type of representative in configuration file each The connection type of unit updates the knot of convolutional calculation unit, pond unit according to the structural parameters of convolutional layer and pond layer respectively Structure;
S3, the data of input are read by data input cell, and are rolled up input data by convolutional calculation unit Product operation, obtains classification results by linear operation unit, pond unit and taxon later, while calculating sub-neural network Inference accuracy rate, finally by data outputting unit output category result and inference accuracy rate;
The inference accuracy rate of sub-neural network is fed back to control terminal by S4 again, and control neural network passes through sub- nerve net The inference accuracy rate of network and pre-set accuracy rate required precision again update sub- neural network row, and generate new son Parameters of Neural Network Structure, while control terminal carries out retraining to the sub-neural network of updated structure, generates sub-neural network Weight parameter, ultimately produce address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter, The address of memory and access module form configuration file;
S5, iteration controls mind after the sub-neural network accuracy rate that hardware-accelerated end returns keeps stablizing repeatedly The topology update of sub-neural network is no longer carried out through network, the structural parameters of sub-neural network remain unchanged;When control nerve net Network no longer updates, that is, searches the optimum structure of the sub-neural network on hardware-accelerated end.
Dynamically optimum structure search can be carried out by the sub-neural network to hardware-accelerated end by such method, and is completed The inference process of the sub-neural network of optimum structure accelerates.
Specifically,
In control terminal, the control neural network of control terminal is recurrent neural network, produces configuration file;
Recurrent neural network is a tree, as shown in figure 3, specific calculation process is as follows:
A1 x) is inputtedtAnd ht-1Two-way is divided to calculate, all the way to xtAnd ht-1It is multiplied and generates the same level memory unit ct, wherein xtFor Sequence synthesized by the inference accuracy rate and accuracy rate required precision of sub-neural network feedback, ht-1For previous secondary control nerve net The sub-neural network structural parameters of network output;
A2 previous step multiplied result is activated using activation primitive ReLU), carries out ReLU (ht-1×xt) operation;
A3) another way is first to xtAnd ht-1Phase adduction carries out tanh (h with activation primitive tanht-1+xt) operation;
A4) previous step result and prime memory unit ct-1It is added;
A5 the activation of ReLU activation primitive) is reused, ReLU (tanh (h is carried outt-1+xt)+ct-1) operation;
A6 sigmoid function, the result finally exported are inputted after) two-way result is multiplied are as follows:
ht=sigmoid (ReLU (xt×ht-1)×ReLU(tanh(ht-1+xt)+ct-1))
htFor the structural parameters for representing the sub-neural network that control neural network exports;
When hardware-accelerated end return sub-neural network accuracy rate holding stablize it is constant after, control neural network no longer into The structural parameters of the topology update of row sub-neural network, sub-neural network remain unchanged;
At hardware-accelerated end;
Data input module is used to split data into n single-channel data block according to input data port number, be inputted Data are image data or the matrix that signal data is converted into;
Configuration file of the allocating cache module for being generated on buffer control end;And it can be read by core calculation module;
One convolutional calculation unit executes one layer of convolutional layer and calculates;M convolution kernel in every layer of convolutional layer is successively to n number Convolution is carried out with step-length k according to block, generates m characteristic pattern, is so carried out l times, wherein n is input data port number, l, m, k difference Convolutional calculation unit number is represented, convolution kernel number, convolution step-length in convolutional calculation unit, with the convolutional layer in configuration file Number, convolution nucleus number, convolution step-length are corresponding;As shown in Figure 4;
One pond unit executes one layer of pond layer and calculates, and receives the data from convolutional calculation unit and carries out feature pumping Sample so carries out o times, and wherein o represents the quantity of pond unit, corresponding with the pond number of plies in configuration file;
One linear computing element executes one layer of full articulamentum and calculates, and receives the data from pond unit, carries out a=F (Wi × c+b) operation carries out T time repeatedly, exports and taxon is transferred to obtain classification results, final classification result export to Data outputting module;Wherein Wi is weight parameter, and T is the number of linear computing element, with connecting layer by layer in configuration file entirely Number is corresponding.C is the data of pond unit output, and b is the biasing in sub-neural network, and a is the output of full articulamentum, F (*) generation Table activation primitive, generally Sigomid or ReLU function;As shown in Figure 5;
The present invention carrys out approximate realization Sigmoid function using piecewise nonlinear approximatioss, when x is in different sections, difference Use three different rank multinomial y=Ax3+Bx2+ Cx+D is fitted processing to Sigmoid function, wherein A, B, C, and D is to pass through The coefficient of three rank multinomials fitting Sigmoid function;It realizes structure as shown in fig. 6, using three rank multinomials of fitting hard Part accelerates to realize Sigmoid function process in end are as follows:
(1) there are different coefficients for the x in different sections;Coefficient A, B, C, D under each piecewise interval are stored in advance in In the on-chip memory at hardware-accelerated end, corresponding coefficient A, B, C, D are taken out by input x;
(2) selection output is carried out by selector, when the x of input is nonnegative number, exports Ax3+Bx2The result of+Cx+D; If the x of input is negative, 1- (Ax is exported3+Bx2+ Cx+D) result;
In the course of work that hardware-accelerated end carries out the acceleration of sub-neural network inference, data input module and configuration are slow Storing module needs to store by the memory on hardware-accelerated dististyle and outside piece through row data, and then hardware-accelerated end needs to obtain On the hardware-accelerated dististyle and address of chip external memory;In the present invention, the address of memory is generated by control terminal, by memory The access module of memory that determines of address then pass through in configuration file configuration information and determine, memory access patterns include master Access module, data access patterns and weight access module etc.;
Wherein, main access module is for the data exchange between on-chip memory and chip external memory, data access patterns For reading data to data input module from on-chip memory and storing the final classification result in core calculation module To memory, weight access module is used to read weight parameter data from on-chip memory;
It should be noted last that the above specific embodiment is only used to illustrate the technical scheme of the present invention and not to limit it, Although being described the invention in detail referring to example, those skilled in the art should understand that, it can be to the present invention Technical solution be modified or replaced equivalently, without departing from the spirit and scope of the technical solution of the present invention, should all cover In the scope of the claims of the present invention.

Claims (6)

1. a kind of neural network dynamic accelerates platform characterized by comprising control terminal and hardware-accelerated end;
Control terminal be used for Training Control neural network, control neural network according to the inference accuracy rate that sub-neural network is fed back with And pre-set accuracy rate required precision, the structure of sub-neural network is updated and generates sub-neural network structure ginseng Number, and retraining is carried out to updated sub-neural network and generates weight parameter, while generating configuration file and is sent to hardware and adding Fast end, configuration file include the structural parameters and weight parameter of sub-neural network;It is pushed away when hardware-accelerated end returns to sub-neural network After stablizing by accuracy rate, control neural network no longer carries out the update of sub-neural network, that is, searches on hardware-accelerated end The optimum structure of sub-neural network;
The sub-neural network is to need to carry out the neural network of inference acceleration at hardware-accelerated end;
The hardware-accelerated end is the sub-neural network hardware inference accelerator realized using ASIC, receives control terminal generation Configuration file completes the hardware realization to sub-neural network, and accelerates the inference process of sub-neural network, while bundle nerve net The inference accuracy rate of network feeds back to control terminal;It is finally reached the son that optimum structure is searched for and completed to sub-neural network optimum structure The hardware inference accelerator of neural network.
2. neural network dynamic as described in claim 1 accelerates platform, which is characterized in that
The structure of sub-neural network includes the convolutional layer number of plies, pondization number, the full articulamentum number of plies, the structure of convolutional layer, Chi Hua layer by layer The structure of layer and the connection type of each layer;Convolutional layer structure includes convolution kernel number, size, step-length, and pond layer structure includes Pond window size, step-length;
Parameter in configuration file includes:
The convolutional layer number of plies, pondization number, the full articulamentum numbers of plies, the connection type of each layer layer by layer;
The structure of convolutional layer: convolution kernel number, size, step-length;
The structure of pond layer: pond window size, step-length;
Each weight parameter of sub-neural network after retraining;
The address of memory, memory access patterns;
Hardware-accelerated end includes data input module, allocating cache module, core calculation module, data outputting module;Its center Heart computing module includes convolutional calculation unit, pond unit, linear computing element, taxon;Wherein convolutional calculation unit, pond Change unit, the quantity of linear computing element is counted with the convolutional layer number of plies, the pondization in configuration file layer by layer respectively, the full articulamentum number of plies It is corresponding;The hardware configuration of convolutional calculation unit determines by the convolutional layer structural parameters in configuration file, the hardware of pond unit Structure is determined by the pond layer structural parameters in configuration file;The hardware connection mode of each unit is by configuring in core calculation module The connection type of each layer in file is determined;
The data input module is used to split data into n single-channel data block, the configuration according to input data port number Configuration file of the cache module for generating on buffer control end, the core calculation module use the m in convolutional calculation unit A convolution kernel successively carries out convolution to n data block with step-length k, generates m characteristic pattern, m, k respectively represent convolutional calculation unit Middle convolution kernel number, convolution step-length, it is corresponding with convolution kernel number, the step-length in configuration file respectively, and it is transferred to Chi Huadan Member is sampled through row feature, generates classification results by linear computing element and taxon later, while calculating sub-neural network Inference accuracy rate, finally by data outputting unit output category result and inference accuracy rate.
3. neural network dynamic as described in claim 1 accelerates platform, which is characterized in that
It is recurrent neural network in the control neural network of control terminal, control terminal, the calculation process of recurrent neural network is as follows:
A1 x) is inputtedtAnd ht-1Two-way is divided to calculate, all the way to xtAnd ht-1It is multiplied and generates the same level memory unit ct, wherein xtFor sub- mind Sequence synthesized by inference accuracy rate and accuracy rate required precision through network-feedback, ht-1It is defeated for previous secondary control neural network Sub-neural network structural parameters out;
A2 previous step multiplied result is activated using activation primitive ReLU), carries out ReLU(ht-1×xt) operation;
A3) another way is first to xtAnd ht-1Phase adduction carries out tanh(h with activation primitive tanht-1+xt) operation;
A4) previous step result and prime memory unit ct-1It is added;
A5 the activation of ReLU activation primitive) is reused, ReLU(tanh(h is carried outt-1+xt)+ct-1) operation;
A6 sigmoid function, the result finally exported are inputted after) two-way result is multiplied are as follows:
ht=sigmoid(ReLU(xt ×ht-1) × ReLU(tanh(ht-1+xt)+ct-1))
htFor the structural parameters for representing the sub-neural network that control neural network exports.
4. neural network dynamic as claimed in claim 2 accelerates platform, which is characterized in that
At hardware-accelerated end;
Data input module is used to split data into n single-channel data block according to input data port number;
Configuration file of the allocating cache module for being generated on buffer control end;And it is read by core calculation module;
One convolutional calculation unit executes one layer of convolutional layer and calculates;M convolution kernel in every layer of convolutional layer is successively to n data block Convolution is carried out with step-length k, generates m characteristic pattern, is so carried out l times, wherein n is input data port number, and l, m, k are respectively represented Convolutional calculation unit number, convolution kernel number, convolution step-length in convolutional calculation unit, respectively layer by layer with the convolution in configuration file Number, convolution kernel number, step-length are corresponding;
One pond unit executes one layer of pond layer and calculates, and receives the data from convolutional calculation unit and carries out feature sampling, It so carries out o times, wherein o represents the quantity of pond unit, corresponding with the pond layer number in configuration file;
One linear computing element executes one layer of full articulamentum and calculates, data of the receiving from pond unit, and progress a=F(Wi × C+b) operation carries out T times repeatedly, and output transfers to taxon to obtain classification results, and final classification result is exported to data Output module;Wherein Wi is weight parameter, and T is the number of linear computing element, with the full articulamentum number of plies phase in configuration file Corresponding, c is the data of pond unit output, and b is the biasing in sub-neural network, and a is the output of full articulamentum, F(*) it represents and swashs Function living.
5. neural network dynamic as claimed in claim 4 accelerates platform, which is characterized in that
It F(*) is Sigomid or ReLU function;At hardware-accelerated end, carry out approximate realization using piecewise nonlinear approximatioss Sigmoid function uses three different rank multinomial y=Ax when x is in different sections respectively3+ Bx2+ Cx+D is to Sigmoid Function is fitted processing, wherein A, B, C, and D is the coefficient that Sigmoid function is fitted by three rank multinomials;Use fitting Three rank multinomials realize Sigmoid function process in hardware-accelerated end are as follows:
(1) there are different coefficients for the x in different sections;Coefficient A, B, C, D under each piecewise interval are stored in advance in hardware In the on-chip memory for accelerating end, corresponding coefficient A, B, C, D are taken out by input x;
(2) selection output is carried out by selector, when the x of input is nonnegative number, exported as Ax3+Bx2+Cx+D;If input X is negative, is exported as 1-(Ax3+Bx2+ Cx+D).
6. a kind of neural network dynamic based on optimum structure search accelerates platform designing method characterized by comprising
S1, the inference that neural network dynamic accelerates the control neural network in the control terminal of platform to be fed back according to sub-neural network Accuracy rate and pre-set accuracy rate required precision, are updated the structure of sub-neural network and generate sub-neural network Structural parameters, while control terminal carries out retraining to the sub-neural network of updated structure, generates the weight ginseng of sub-neural network Number, ultimately produces address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter, memory Address and access module form configuration file;
Configuration file write-in neural network dynamic is accelerated the allocating cache module at hardware-accelerated end in platform by S2, and according to matching Setting the convolutional layer number of plies in file, pondization, number, the full articulamentum number of plies update convolutional calculation list in core calculation module respectively layer by layer Member, pond unit, linear computing element quantity update each unit according to the parameter of each layer connection type of representative in configuration file Connection type, update the structure of convolutional calculation unit, pond unit respectively according to the structural parameters of convolutional layer and pond layer;
S3 is read the data of input by data input cell, and input data is carried out convolution fortune by convolutional calculation unit It calculates, classification results is obtained by linear operation unit, pond unit and taxon later, while calculating pushing away for sub-neural network By accuracy rate, finally by data outputting unit output category result and inference accuracy rate;
The inference accuracy rate of sub-neural network is fed back to control terminal by S4 again, and control neural network passes through sub-neural network Inference accuracy rate and pre-set accuracy rate required precision again update sub- neural network row, and generate new son nerve Network architecture parameters, while control terminal carries out retraining to the sub-neural network of updated structure, generates the power of sub-neural network Weight parameter, ultimately produces address and the access module of memory;According to the structural parameters of sub-neural network, weight parameter, storage The address of device and access module form configuration file;
S5, iteration controls nerve net after the sub-neural network accuracy rate that hardware-accelerated end returns keeps stablizing repeatedly Network no longer carries out the topology update of sub-neural network, and the structural parameters of sub-neural network remain unchanged;When control neural network not It updates again, that is, searches the optimum structure of the sub-neural network on hardware-accelerated end.
CN201910175975.0A 2019-03-08 2019-03-08 Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform Active CN109934336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910175975.0A CN109934336B (en) 2019-03-08 2019-03-08 Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910175975.0A CN109934336B (en) 2019-03-08 2019-03-08 Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform

Publications (2)

Publication Number Publication Date
CN109934336A true CN109934336A (en) 2019-06-25
CN109934336B CN109934336B (en) 2023-05-16

Family

ID=66986513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910175975.0A Active CN109934336B (en) 2019-03-08 2019-03-08 Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform

Country Status (1)

Country Link
CN (1) CN109934336B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110673786A (en) * 2019-09-03 2020-01-10 浪潮电子信息产业股份有限公司 Data caching method and device
CN111027689A (en) * 2019-11-20 2020-04-17 中国航空工业集团公司西安航空计算技术研究所 Configuration method, device and computing system
RU2721181C1 (en) * 2019-08-19 2020-05-18 Бейджин Сяоми Интеллиджент Текнолоджи Ко., Лтд. Super network construction method, method of use, device and data medium
CN111340221A (en) * 2020-02-25 2020-06-26 北京百度网讯科技有限公司 Method and device for sampling neural network structure
CN111782398A (en) * 2020-06-29 2020-10-16 上海商汤智能科技有限公司 Data processing method, device and system and related equipment
CN111931926A (en) * 2020-10-12 2020-11-13 南京风兴科技有限公司 Hardware acceleration system and control method for convolutional neural network CNN
CN112106077A (en) * 2019-10-30 2020-12-18 深圳市大疆创新科技有限公司 Method, apparatus, storage medium, and computer program product for network structure search
CN112561028A (en) * 2019-09-25 2021-03-26 华为技术有限公司 Method for training neural network model, and method and device for data processing
CN112613605A (en) * 2020-12-07 2021-04-06 深兰人工智能(深圳)有限公司 Neural network acceleration control method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247992A (en) * 2014-12-30 2017-10-13 合肥工业大学 A kind of sigmoid Function Fitting hardware circuits based on row maze approximate algorithm
CN107451653A (en) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 Computational methods, device and the readable storage medium storing program for executing of deep neural network
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108710941A (en) * 2018-04-11 2018-10-26 杭州菲数科技有限公司 The hard acceleration method and device of neural network model for electronic equipment
CN108764466A (en) * 2018-03-07 2018-11-06 东南大学 Convolutional neural networks hardware based on field programmable gate array and its accelerated method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247992A (en) * 2014-12-30 2017-10-13 合肥工业大学 A kind of sigmoid Function Fitting hardware circuits based on row maze approximate algorithm
CN107451653A (en) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 Computational methods, device and the readable storage medium storing program for executing of deep neural network
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108764466A (en) * 2018-03-07 2018-11-06 东南大学 Convolutional neural networks hardware based on field programmable gate array and its accelerated method
CN108710941A (en) * 2018-04-11 2018-10-26 杭州菲数科技有限公司 The hard acceleration method and device of neural network model for electronic equipment

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2721181C1 (en) * 2019-08-19 2020-05-18 Бейджин Сяоми Интеллиджент Текнолоджи Ко., Лтд. Super network construction method, method of use, device and data medium
US11803475B2 (en) 2019-09-03 2023-10-31 Inspur Electronic Information Industry Co., Ltd. Method and apparatus for data caching
CN110673786A (en) * 2019-09-03 2020-01-10 浪潮电子信息产业股份有限公司 Data caching method and device
CN112561028A (en) * 2019-09-25 2021-03-26 华为技术有限公司 Method for training neural network model, and method and device for data processing
WO2021081809A1 (en) * 2019-10-30 2021-05-06 深圳市大疆创新科技有限公司 Network architecture search method and apparatus, and storage medium and computer program product
CN112106077A (en) * 2019-10-30 2020-12-18 深圳市大疆创新科技有限公司 Method, apparatus, storage medium, and computer program product for network structure search
CN111027689A (en) * 2019-11-20 2020-04-17 中国航空工业集团公司西安航空计算技术研究所 Configuration method, device and computing system
CN111027689B (en) * 2019-11-20 2024-03-22 中国航空工业集团公司西安航空计算技术研究所 Configuration method, device and computing system
CN111340221A (en) * 2020-02-25 2020-06-26 北京百度网讯科技有限公司 Method and device for sampling neural network structure
CN111340221B (en) * 2020-02-25 2023-09-12 北京百度网讯科技有限公司 Neural network structure sampling method and device
CN111782398A (en) * 2020-06-29 2020-10-16 上海商汤智能科技有限公司 Data processing method, device and system and related equipment
CN111931926A (en) * 2020-10-12 2020-11-13 南京风兴科技有限公司 Hardware acceleration system and control method for convolutional neural network CNN
CN112613605A (en) * 2020-12-07 2021-04-06 深兰人工智能(深圳)有限公司 Neural network acceleration control method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109934336B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN109934336A (en) Neural network dynamic based on optimum structure search accelerates platform designing method and neural network dynamic to accelerate platform
CN109063825A (en) Convolutional neural networks accelerator
CN110390383A (en) A kind of deep neural network hardware accelerator based on power exponent quantization
CN110059811A (en) Weight buffer
CN108898554A (en) Improve the method and Related product of image resolution ratio
CN109032781A (en) A kind of FPGA parallel system of convolutional neural networks algorithm
CN110458279A (en) A kind of binary neural network accelerated method and system based on FPGA
WO2020156508A1 (en) Method and device for operating on basis of chip with operation array, and chip
CN108764466A (en) Convolutional neural networks hardware based on field programmable gate array and its accelerated method
CN110163353A (en) A kind of computing device and method
CN110533183A (en) The model partition and task laying method of heterogeneous network perception in a kind of assembly line distribution deep learning
CN106127302A (en) Process the circuit of data, image processing system, the method and apparatus of process data
CN110383300A (en) A kind of computing device and method
CN108647184A (en) A kind of Dynamic High-accuracy bit convolution multiplication Fast implementation
CN108491924B (en) Neural network data serial flow processing device for artificial intelligence calculation
CN110222835A (en) A kind of convolutional neural networks hardware system and operation method based on zero value detection
CN109993293A (en) A kind of deep learning accelerator suitable for stack hourglass network
CN111091183B (en) Neural network acceleration system and method
CN115470889A (en) Network-on-chip autonomous optimal mapping exploration system and method based on reinforcement learning
CN110377874A (en) Convolution algorithm method and system
CN110414672A (en) Convolution algorithm method, apparatus and system
CN114445607A (en) Storage and calculation integrated low-power-consumption integrated image recognition system and method
CN109359542A (en) The determination method and terminal device of vehicle damage rank neural network based
CN109389210A (en) Processing method and processing unit
CN116797850A (en) Class increment image classification method based on knowledge distillation and consistency regularization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant