CN108416438A - A kind of convolutional neural networks hardware module dispositions method - Google Patents
A kind of convolutional neural networks hardware module dispositions method Download PDFInfo
- Publication number
- CN108416438A CN108416438A CN201810539816.XA CN201810539816A CN108416438A CN 108416438 A CN108416438 A CN 108416438A CN 201810539816 A CN201810539816 A CN 201810539816A CN 108416438 A CN108416438 A CN 108416438A
- Authority
- CN
- China
- Prior art keywords
- convolutional neural
- neural networks
- module
- hardware
- hardware resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The present invention discloses a kind of convolutional neural networks hardware module dispositions method, is related to convolutional neural networks and realizes field;Using the upper layer compiler of convolutional neural networks according to target hardware resource and the data scale of construction of convolutional neural networks model, simulation comparison is carried out to the way of realization of convolutional neural networks, the requirement of balancing hardware resource and convolutional neural networks speed, determine the deployment parameters of each hardware module of convolutional neural networks, it divides the quantity of each hardware module and determines the connection type between hardware module, to realize that convolutional neural networks hardware module is disposed.
Description
Technical field
The present invention discloses a kind of convolutional neural networks hardware module dispositions method, is related to convolutional neural networks and realizes field.
Background technology
Convolutional neural networks(CNN)It is multi-layer perception (MLP)(MLP)A mutation model.It develops from biological concept
, visual cortex cell someways covers entire vision territory, just as some filters, their images to input
It is local sensitivity, therefore can preferably excavates the spatial relationship information of the target in natural image.CNN is by reinforcing god
Local connection mode through the intermediate node of adjacent layer in network is believed to excavate the space local association of targets of interest in natural image
Breath.
The deployment way of CNN is mostly using X86-based CPU platform+GPU modes at present as hardware environment, real-time
Such as TensorFlow, Caffe software frame is run in operating system, model parameter starts training and reasoning after setting
Journey.In application end due to limitations such as resources, the smaller model of the scale of construction can only be often run, and model running is less efficient, identified
Speed is horizontal far below the ends PC.The invention discloses a kind of convolutional neural networks hardware module dispositions methods, with hardware circuit
Form realizes certain convolutional neural networks model, passes through combining target hardware resource and the neural network model scale of construction, trading-off resources
With rate request, quantity distribution is carried out to hardware module and connection type divides, hardware logic can be made full use of to be provided with storage
The conversion deployment between the realization deployment and different structure layer of each layer of convolutional neural networks is efficiently rapidly completed in source.It can be flexible
It realizes various neural hardware configurations, and forward position can be researched and developed and be converted to hardware realization as early as possible, promote development efficiency.
Invention content
The present invention is directed to problem of the prior art, a kind of convolutional neural networks hardware module dispositions method is provided, using spy
Determine constructional hardware resource, for different neural network characteristics, deployed in real time CNN to application end can greatly play neural network valence
Value.
Concrete scheme proposed by the present invention is:
A kind of convolutional neural networks hardware module dispositions method:
The upper layer compiler of convolutional neural networks is according to target hardware resource and the data scale of construction of convolutional neural networks model, to volume
The way of realization of product neural network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines volume
The deployment parameters of each hardware module of product neural network divide the quantity of each hardware module and determine the connection between hardware module
Mode, to realize that convolutional neural networks hardware module is disposed.
The hardware module of convolutional neural networks includes the basic module of convolutional neural networks and by convolution in the method
The peculiar module that neural network model feature determines, wherein basic module include convolution module, pond module, accumulator module, number
According to access module, instruction parsing distribution module.
According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net
The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines convolutional Neural net
Convolution module basic unit size, the quantity of network, and whether be combined the basic unit of convolution module.
According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net
The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines pond module
Basic unit size, quantity.
According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net
The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines in accumulator module
Portion's tree, quantity, caching number.
According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net
The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines data access mould
Block capacity, storage form.
According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net
The way of realization of network carries out simulation comparison, the requirement of balancing hardware resource and convolutional neural networks speed, determine instruction parsing point
Module and other module connection types and instruction execution feedback system are sent out, and the deployment parameters of hardware module are deposited with instruction type
It is placed in instruction parsing distribution module, other hardware modules is given according to instruction execution transmission of feedback information.
Peculiar module includes expansion module, residual error module, compression module in the method.
Usefulness of the present invention is:
The present invention provides a kind of convolutional neural networks hardware module dispositions method, passes through combining target hardware resource and neural network
The model scale of construction, trading-off resources and rate request carry out quantity distribution to hardware module and connection type divide, can make full use of
Hardware logic and storage resource are efficiently rapidly completed between the realization deployment and different structure layer of each layer of convolutional neural networks
Conversion deployment.It can flexibly realize various neural hardware configurations, and forward position can be researched and developed and be converted to hardware realization as early as possible, promotion is opened
Send out efficiency.
Description of the drawings
Fig. 1 hardware modules of the present invention dispose relation schematic diagram;
Fig. 2 embodiment of the present invention disposes schematic diagram.
Specific implementation mode
The present invention provides a kind of convolutional neural networks hardware module dispositions method:
The upper layer compiler of convolutional neural networks is according to target hardware resource and the data scale of construction of convolutional neural networks model, to volume
The way of realization of product neural network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines volume
The deployment parameters of each hardware module of product neural network divide the quantity of each hardware module and determine the connection between hardware module
Mode, to realize that convolutional neural networks hardware module is disposed.
In conjunction with the drawings and the specific embodiments, the present invention will be further described.
Wherein in the present invention in each hardware module of convolutional neural networks, the basic module of convolutional neural networks includes but not
It is limited to:Convolution module, pond module, accumulator module, data access module, instruction parsing distribution module etc. are respectively used to realize
A step in convolutional neural networks or the operation of a few steps;
Include but not limited to by the peculiar module that the convolutional neural networks aspect of model determines:Expansion module, residual error module compress mould
Block etc..
Each parameter of hardware module divides each mould by upper layer compiler according to target hardware resource and the neural network model scale of construction
Number of blocks and connection type and the convolutional neural networks aspect of model determine convolution module basic unit size, quantity, if group
It closes;Determine pond module base unit size, quantity;Determine accumulator module type of attachment, internal tree, quantity, caching
Number;Determination data access module capacity, type of attachment and storage form;Determine that instruction parsing distribution module connects with other modules
Connect the parameters such as mode and instruction execution feedback system;Above-mentioned parameter is stored in instruction parsing distribution module with instruction type, according to
Instruction execution transmission of feedback information gives other each modules;
Each parameter of hardware module can optimize through manual amendment;Hardware module deployment can be changed by upper layer compiler at any time, to adapt to
Different neural network models.
It is applied in practice, such as Fig. 2, according to target hardware resource is divided into 64 convolutional calculation basic units, each unit
Convolution algorithm can be carried out to 16 row input feature vector figures and convolution kernel;
This layer network will complete the convolution of 256x256x64 to 256x256x128, non-linear, pondization calculates, by input feature vector figure
By every 16 row, one component masses, 2 rows are repeated between adjacent 2 pieces, to ensure that block edge information is not lost;
Each convolution basic computational ele- ment is loaded into 4 convolution kernels, maximally utilizes target hardware resource, and first calculation block is 1. with preceding 4
The convolution results of a convolution kernel, having traversed 128 convolution kernel of whole needs 128/4=32 wheel to calculate, and completes all input channel blocks 1.
Convolution and nonlinear operation;All pieces of input channel is traversed, whole picture input channel is completed and calculates;
Every 64 one group of feeding summing elements of intermediate result of 64 convolution basic units carry out input channel cumulative fortune parallel
It calculates, as a result enters pond unit, data access module unloading is sent into behind pond;
It calculates needed for this layer after be fully completed, new round instruction is issued by instruction parsing distribution module, to each hardware module weight
New deployment, each parameter all update, and are required with adapting to new one layer of convolutional neural networks calculating.
Claims (8)
1. a kind of convolutional neural networks hardware module dispositions method, it is characterized in that
The upper layer compiler of convolutional neural networks is according to target hardware resource and the data scale of construction of convolutional neural networks model, to volume
The way of realization of product neural network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines volume
The deployment parameters of each hardware module of product neural network divide the quantity of each hardware module and determine the connection between hardware module
Mode, to realize that convolutional neural networks hardware module is disposed.
2. according to the method described in claim 1, it is characterized in that the hardware module of the convolutional neural networks includes convolutional Neural
The basic module of network and the peculiar module determined by the convolutional neural networks aspect of model, wherein basic module include convolution mould
Block, pond module, accumulator module, data access module, instruction parsing distribution module.
3. according to the method described in claim 2, it is characterized in that according to target hardware resource and the number of convolutional neural networks model
According to the scale of construction, simulation comparison, balancing hardware resource and convolutional neural networks speed are carried out to the way of realization of convolutional neural networks
It is required that determine convolution module basic unit size, the quantity of convolutional neural networks, and whether by the basic unit of convolution module
It is combined.
4. according to the method in claim 2 or 3, it is characterized in that according to target hardware resource and convolutional neural networks model
The data scale of construction carries out simulation comparison, balancing hardware resource and convolutional neural networks speed to the way of realization of convolutional neural networks
Requirement, determine basic unit size, the quantity of pond module.
5. according to the method described in claim 4, it is characterized in that according to target hardware resource and the number of convolutional neural networks model
According to the scale of construction, simulation comparison, balancing hardware resource and convolutional neural networks speed are carried out to the way of realization of convolutional neural networks
It is required that determining tree inside accumulator module, quantity, caching number.
6. according to the method described in claim 2,3 or 5, it is characterized in that according to target hardware resource and convolutional neural networks model
The data scale of construction, simulation comparison carried out to the way of realization of convolutional neural networks, balancing hardware resource and convolutional neural networks speed
The requirement of degree determines data access module capacity, storage form.
7. according to the method described in claim 6, it is characterized in that according to target hardware resource and the number of convolutional neural networks model
According to the scale of construction, simulation comparison, balancing hardware resource and convolutional neural networks speed are carried out to the way of realization of convolutional neural networks
It is required that determine instruction parsing distribution module and other module connection types and instruction execution feedback system, and hardware module
Deployment parameters are stored in instruction type in instruction parsing distribution module, other hardware are given according to instruction execution transmission of feedback information
Module.
8. according to the method described in claim 7, it is characterized in that the peculiar module includes expansion module, residual error module is compressed
Module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810539816.XA CN108416438A (en) | 2018-05-30 | 2018-05-30 | A kind of convolutional neural networks hardware module dispositions method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810539816.XA CN108416438A (en) | 2018-05-30 | 2018-05-30 | A kind of convolutional neural networks hardware module dispositions method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108416438A true CN108416438A (en) | 2018-08-17 |
Family
ID=63140938
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810539816.XA Pending CN108416438A (en) | 2018-05-30 | 2018-05-30 | A kind of convolutional neural networks hardware module dispositions method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108416438A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109376849A (en) * | 2018-09-26 | 2019-02-22 | 旺微科技(上海)有限公司 | A kind of control method and device of convolutional neural networks system |
CN111104124A (en) * | 2019-11-07 | 2020-05-05 | 北京航空航天大学 | Pythrch framework-based rapid deployment method of convolutional neural network on FPGA |
CN111897660A (en) * | 2020-09-29 | 2020-11-06 | 深圳云天励飞技术股份有限公司 | Model deployment method, model deployment device and terminal equipment |
WO2021056677A1 (en) * | 2019-09-27 | 2021-04-01 | 东南大学 | Dual-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106355244A (en) * | 2016-08-30 | 2017-01-25 | 深圳市诺比邻科技有限公司 | CNN (convolutional neural network) construction method and system |
-
2018
- 2018-05-30 CN CN201810539816.XA patent/CN108416438A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106355244A (en) * | 2016-08-30 | 2017-01-25 | 深圳市诺比邻科技有限公司 | CNN (convolutional neural network) construction method and system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109376849A (en) * | 2018-09-26 | 2019-02-22 | 旺微科技(上海)有限公司 | A kind of control method and device of convolutional neural networks system |
WO2021056677A1 (en) * | 2019-09-27 | 2021-04-01 | 东南大学 | Dual-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network |
CN111104124A (en) * | 2019-11-07 | 2020-05-05 | 北京航空航天大学 | Pythrch framework-based rapid deployment method of convolutional neural network on FPGA |
CN111104124B (en) * | 2019-11-07 | 2021-07-20 | 北京航空航天大学 | Pythrch framework-based rapid deployment method of convolutional neural network on FPGA |
CN111897660A (en) * | 2020-09-29 | 2020-11-06 | 深圳云天励飞技术股份有限公司 | Model deployment method, model deployment device and terminal equipment |
CN111897660B (en) * | 2020-09-29 | 2021-01-15 | 深圳云天励飞技术股份有限公司 | Model deployment method, model deployment device and terminal equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108416438A (en) | A kind of convolutional neural networks hardware module dispositions method | |
CN109117953B (en) | Network parameter training method and system, server, client and storage medium | |
CN107918794A (en) | Neural network processor based on computing array | |
CN108090560A (en) | The design method of LSTM recurrent neural network hardware accelerators based on FPGA | |
CN106228238B (en) | Accelerate the method and system of deep learning algorithm on field programmable gate array platform | |
CN107578095B (en) | Neural computing device and processor comprising the computing device | |
CN107038064A (en) | Virtual machine management method and device, storage medium | |
CN107343025B (en) | Delay optimization method under distributed satellite cloud and mist network architecture and energy consumption constraint | |
CN104077797B (en) | three-dimensional game animation system | |
CN109523621A (en) | Loading method and device, storage medium, the electronic device of object | |
CN114172937B (en) | Dynamic service function chain arrangement method and system based on deep reinforcement learning | |
CN107609652A (en) | Perform the distributed system and its method of machine learning | |
CN112100155A (en) | Cloud edge cooperative digital twin model assembling and fusing method | |
JP2009512048A (en) | Method, apparatus and program for transmitting a roof and building structure in a three-dimensional representation of a building roof based on the roof and building structure | |
CN107818367A (en) | Processing system and processing method for neutral net | |
CN110069815A (en) | Index system construction method, system and terminal device | |
CN110251942A (en) | Control the method and device of virtual role in scene of game | |
Shopf et al. | March of the Froblins: simulation and rendering massive crowds of intelligent and detailed creatures on GPU | |
CN113590232B (en) | Relay edge network task unloading method based on digital twinning | |
CN109558901A (en) | A kind of semantic segmentation training method and device, electronic equipment, storage medium | |
CN112581578A (en) | Cloud rendering system based on software definition | |
CN114691765A (en) | Data processing method and device in artificial intelligence system | |
CN108196951B (en) | GPU basin runoff simulation distributed scheduling system and method | |
JP2007052775A5 (en) | ||
CN114641041A (en) | Edge-intelligent-oriented Internet of vehicles slicing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180817 |
|
RJ01 | Rejection of invention patent application after publication |