CN108416438A - A kind of convolutional neural networks hardware module dispositions method - Google Patents

A kind of convolutional neural networks hardware module dispositions method Download PDF

Info

Publication number
CN108416438A
CN108416438A CN201810539816.XA CN201810539816A CN108416438A CN 108416438 A CN108416438 A CN 108416438A CN 201810539816 A CN201810539816 A CN 201810539816A CN 108416438 A CN108416438 A CN 108416438A
Authority
CN
China
Prior art keywords
convolutional neural
neural networks
module
hardware
hardware resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810539816.XA
Other languages
Chinese (zh)
Inventor
王子彤
姜凯
聂林川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Hi Tech Investment and Development Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co Ltd filed Critical Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN201810539816.XA priority Critical patent/CN108416438A/en
Publication of CN108416438A publication Critical patent/CN108416438A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The present invention discloses a kind of convolutional neural networks hardware module dispositions method, is related to convolutional neural networks and realizes field;Using the upper layer compiler of convolutional neural networks according to target hardware resource and the data scale of construction of convolutional neural networks model, simulation comparison is carried out to the way of realization of convolutional neural networks, the requirement of balancing hardware resource and convolutional neural networks speed, determine the deployment parameters of each hardware module of convolutional neural networks, it divides the quantity of each hardware module and determines the connection type between hardware module, to realize that convolutional neural networks hardware module is disposed.

Description

A kind of convolutional neural networks hardware module dispositions method
Technical field
The present invention discloses a kind of convolutional neural networks hardware module dispositions method, is related to convolutional neural networks and realizes field.
Background technology
Convolutional neural networks(CNN)It is multi-layer perception (MLP)(MLP)A mutation model.It develops from biological concept , visual cortex cell someways covers entire vision territory, just as some filters, their images to input It is local sensitivity, therefore can preferably excavates the spatial relationship information of the target in natural image.CNN is by reinforcing god Local connection mode through the intermediate node of adjacent layer in network is believed to excavate the space local association of targets of interest in natural image Breath.
The deployment way of CNN is mostly using X86-based CPU platform+GPU modes at present as hardware environment, real-time Such as TensorFlow, Caffe software frame is run in operating system, model parameter starts training and reasoning after setting Journey.In application end due to limitations such as resources, the smaller model of the scale of construction can only be often run, and model running is less efficient, identified Speed is horizontal far below the ends PC.The invention discloses a kind of convolutional neural networks hardware module dispositions methods, with hardware circuit Form realizes certain convolutional neural networks model, passes through combining target hardware resource and the neural network model scale of construction, trading-off resources With rate request, quantity distribution is carried out to hardware module and connection type divides, hardware logic can be made full use of to be provided with storage The conversion deployment between the realization deployment and different structure layer of each layer of convolutional neural networks is efficiently rapidly completed in source.It can be flexible It realizes various neural hardware configurations, and forward position can be researched and developed and be converted to hardware realization as early as possible, promote development efficiency.
Invention content
The present invention is directed to problem of the prior art, a kind of convolutional neural networks hardware module dispositions method is provided, using spy Determine constructional hardware resource, for different neural network characteristics, deployed in real time CNN to application end can greatly play neural network valence Value.
Concrete scheme proposed by the present invention is:
A kind of convolutional neural networks hardware module dispositions method:
The upper layer compiler of convolutional neural networks is according to target hardware resource and the data scale of construction of convolutional neural networks model, to volume The way of realization of product neural network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines volume The deployment parameters of each hardware module of product neural network divide the quantity of each hardware module and determine the connection between hardware module Mode, to realize that convolutional neural networks hardware module is disposed.
The hardware module of convolutional neural networks includes the basic module of convolutional neural networks and by convolution in the method The peculiar module that neural network model feature determines, wherein basic module include convolution module, pond module, accumulator module, number According to access module, instruction parsing distribution module.
According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines convolutional Neural net Convolution module basic unit size, the quantity of network, and whether be combined the basic unit of convolution module.
According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines pond module Basic unit size, quantity.
According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines in accumulator module Portion's tree, quantity, caching number.
According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines data access mould Block capacity, storage form.
According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net The way of realization of network carries out simulation comparison, the requirement of balancing hardware resource and convolutional neural networks speed, determine instruction parsing point Module and other module connection types and instruction execution feedback system are sent out, and the deployment parameters of hardware module are deposited with instruction type It is placed in instruction parsing distribution module, other hardware modules is given according to instruction execution transmission of feedback information.
Peculiar module includes expansion module, residual error module, compression module in the method.
Usefulness of the present invention is:
The present invention provides a kind of convolutional neural networks hardware module dispositions method, passes through combining target hardware resource and neural network The model scale of construction, trading-off resources and rate request carry out quantity distribution to hardware module and connection type divide, can make full use of Hardware logic and storage resource are efficiently rapidly completed between the realization deployment and different structure layer of each layer of convolutional neural networks Conversion deployment.It can flexibly realize various neural hardware configurations, and forward position can be researched and developed and be converted to hardware realization as early as possible, promotion is opened Send out efficiency.
Description of the drawings
Fig. 1 hardware modules of the present invention dispose relation schematic diagram;
Fig. 2 embodiment of the present invention disposes schematic diagram.
Specific implementation mode
The present invention provides a kind of convolutional neural networks hardware module dispositions method:
The upper layer compiler of convolutional neural networks is according to target hardware resource and the data scale of construction of convolutional neural networks model, to volume The way of realization of product neural network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines volume The deployment parameters of each hardware module of product neural network divide the quantity of each hardware module and determine the connection between hardware module Mode, to realize that convolutional neural networks hardware module is disposed.
In conjunction with the drawings and the specific embodiments, the present invention will be further described.
Wherein in the present invention in each hardware module of convolutional neural networks, the basic module of convolutional neural networks includes but not It is limited to:Convolution module, pond module, accumulator module, data access module, instruction parsing distribution module etc. are respectively used to realize A step in convolutional neural networks or the operation of a few steps;
Include but not limited to by the peculiar module that the convolutional neural networks aspect of model determines:Expansion module, residual error module compress mould Block etc..
Each parameter of hardware module divides each mould by upper layer compiler according to target hardware resource and the neural network model scale of construction Number of blocks and connection type and the convolutional neural networks aspect of model determine convolution module basic unit size, quantity, if group It closes;Determine pond module base unit size, quantity;Determine accumulator module type of attachment, internal tree, quantity, caching Number;Determination data access module capacity, type of attachment and storage form;Determine that instruction parsing distribution module connects with other modules Connect the parameters such as mode and instruction execution feedback system;Above-mentioned parameter is stored in instruction parsing distribution module with instruction type, according to Instruction execution transmission of feedback information gives other each modules;
Each parameter of hardware module can optimize through manual amendment;Hardware module deployment can be changed by upper layer compiler at any time, to adapt to Different neural network models.
It is applied in practice, such as Fig. 2, according to target hardware resource is divided into 64 convolutional calculation basic units, each unit Convolution algorithm can be carried out to 16 row input feature vector figures and convolution kernel;
This layer network will complete the convolution of 256x256x64 to 256x256x128, non-linear, pondization calculates, by input feature vector figure By every 16 row, one component masses, 2 rows are repeated between adjacent 2 pieces, to ensure that block edge information is not lost;
Each convolution basic computational ele- ment is loaded into 4 convolution kernels, maximally utilizes target hardware resource, and first calculation block is 1. with preceding 4 The convolution results of a convolution kernel, having traversed 128 convolution kernel of whole needs 128/4=32 wheel to calculate, and completes all input channel blocks 1. Convolution and nonlinear operation;All pieces of input channel is traversed, whole picture input channel is completed and calculates;
Every 64 one group of feeding summing elements of intermediate result of 64 convolution basic units carry out input channel cumulative fortune parallel It calculates, as a result enters pond unit, data access module unloading is sent into behind pond;
It calculates needed for this layer after be fully completed, new round instruction is issued by instruction parsing distribution module, to each hardware module weight New deployment, each parameter all update, and are required with adapting to new one layer of convolutional neural networks calculating.

Claims (8)

1. a kind of convolutional neural networks hardware module dispositions method, it is characterized in that
The upper layer compiler of convolutional neural networks is according to target hardware resource and the data scale of construction of convolutional neural networks model, to volume The way of realization of product neural network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines volume The deployment parameters of each hardware module of product neural network divide the quantity of each hardware module and determine the connection between hardware module Mode, to realize that convolutional neural networks hardware module is disposed.
2. according to the method described in claim 1, it is characterized in that the hardware module of the convolutional neural networks includes convolutional Neural The basic module of network and the peculiar module determined by the convolutional neural networks aspect of model, wherein basic module include convolution mould Block, pond module, accumulator module, data access module, instruction parsing distribution module.
3. according to the method described in claim 2, it is characterized in that according to target hardware resource and the number of convolutional neural networks model According to the scale of construction, simulation comparison, balancing hardware resource and convolutional neural networks speed are carried out to the way of realization of convolutional neural networks It is required that determine convolution module basic unit size, the quantity of convolutional neural networks, and whether by the basic unit of convolution module It is combined.
4. according to the method in claim 2 or 3, it is characterized in that according to target hardware resource and convolutional neural networks model The data scale of construction carries out simulation comparison, balancing hardware resource and convolutional neural networks speed to the way of realization of convolutional neural networks Requirement, determine basic unit size, the quantity of pond module.
5. according to the method described in claim 4, it is characterized in that according to target hardware resource and the number of convolutional neural networks model According to the scale of construction, simulation comparison, balancing hardware resource and convolutional neural networks speed are carried out to the way of realization of convolutional neural networks It is required that determining tree inside accumulator module, quantity, caching number.
6. according to the method described in claim 2,3 or 5, it is characterized in that according to target hardware resource and convolutional neural networks model The data scale of construction, simulation comparison carried out to the way of realization of convolutional neural networks, balancing hardware resource and convolutional neural networks speed The requirement of degree determines data access module capacity, storage form.
7. according to the method described in claim 6, it is characterized in that according to target hardware resource and the number of convolutional neural networks model According to the scale of construction, simulation comparison, balancing hardware resource and convolutional neural networks speed are carried out to the way of realization of convolutional neural networks It is required that determine instruction parsing distribution module and other module connection types and instruction execution feedback system, and hardware module Deployment parameters are stored in instruction type in instruction parsing distribution module, other hardware are given according to instruction execution transmission of feedback information Module.
8. according to the method described in claim 7, it is characterized in that the peculiar module includes expansion module, residual error module is compressed Module.
CN201810539816.XA 2018-05-30 2018-05-30 A kind of convolutional neural networks hardware module dispositions method Pending CN108416438A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810539816.XA CN108416438A (en) 2018-05-30 2018-05-30 A kind of convolutional neural networks hardware module dispositions method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810539816.XA CN108416438A (en) 2018-05-30 2018-05-30 A kind of convolutional neural networks hardware module dispositions method

Publications (1)

Publication Number Publication Date
CN108416438A true CN108416438A (en) 2018-08-17

Family

ID=63140938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810539816.XA Pending CN108416438A (en) 2018-05-30 2018-05-30 A kind of convolutional neural networks hardware module dispositions method

Country Status (1)

Country Link
CN (1) CN108416438A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376849A (en) * 2018-09-26 2019-02-22 旺微科技(上海)有限公司 A kind of control method and device of convolutional neural networks system
CN111104124A (en) * 2019-11-07 2020-05-05 北京航空航天大学 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA
CN111897660A (en) * 2020-09-29 2020-11-06 深圳云天励飞技术股份有限公司 Model deployment method, model deployment device and terminal equipment
WO2021056677A1 (en) * 2019-09-27 2021-04-01 东南大学 Dual-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355244A (en) * 2016-08-30 2017-01-25 深圳市诺比邻科技有限公司 CNN (convolutional neural network) construction method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355244A (en) * 2016-08-30 2017-01-25 深圳市诺比邻科技有限公司 CNN (convolutional neural network) construction method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376849A (en) * 2018-09-26 2019-02-22 旺微科技(上海)有限公司 A kind of control method and device of convolutional neural networks system
WO2021056677A1 (en) * 2019-09-27 2021-04-01 东南大学 Dual-phase coefficient adjustable analog multiplication calculation circuit for convolutional neural network
CN111104124A (en) * 2019-11-07 2020-05-05 北京航空航天大学 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA
CN111104124B (en) * 2019-11-07 2021-07-20 北京航空航天大学 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA
CN111897660A (en) * 2020-09-29 2020-11-06 深圳云天励飞技术股份有限公司 Model deployment method, model deployment device and terminal equipment
CN111897660B (en) * 2020-09-29 2021-01-15 深圳云天励飞技术股份有限公司 Model deployment method, model deployment device and terminal equipment

Similar Documents

Publication Publication Date Title
CN108416438A (en) A kind of convolutional neural networks hardware module dispositions method
CN109117953B (en) Network parameter training method and system, server, client and storage medium
CN107918794A (en) Neural network processor based on computing array
CN108090560A (en) The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN106228238B (en) Accelerate the method and system of deep learning algorithm on field programmable gate array platform
CN107578095B (en) Neural computing device and processor comprising the computing device
CN107038064A (en) Virtual machine management method and device, storage medium
CN107343025B (en) Delay optimization method under distributed satellite cloud and mist network architecture and energy consumption constraint
CN104077797B (en) three-dimensional game animation system
CN109523621A (en) Loading method and device, storage medium, the electronic device of object
CN114172937B (en) Dynamic service function chain arrangement method and system based on deep reinforcement learning
CN107609652A (en) Perform the distributed system and its method of machine learning
CN112100155A (en) Cloud edge cooperative digital twin model assembling and fusing method
JP2009512048A (en) Method, apparatus and program for transmitting a roof and building structure in a three-dimensional representation of a building roof based on the roof and building structure
CN107818367A (en) Processing system and processing method for neutral net
CN110069815A (en) Index system construction method, system and terminal device
CN110251942A (en) Control the method and device of virtual role in scene of game
Shopf et al. March of the Froblins: simulation and rendering massive crowds of intelligent and detailed creatures on GPU
CN113590232B (en) Relay edge network task unloading method based on digital twinning
CN109558901A (en) A kind of semantic segmentation training method and device, electronic equipment, storage medium
CN112581578A (en) Cloud rendering system based on software definition
CN114691765A (en) Data processing method and device in artificial intelligence system
CN108196951B (en) GPU basin runoff simulation distributed scheduling system and method
JP2007052775A5 (en)
CN114641041A (en) Edge-intelligent-oriented Internet of vehicles slicing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180817

RJ01 Rejection of invention patent application after publication