CN108416438A

CN108416438A - A kind of convolutional neural networks hardware module dispositions method

Info

Publication number: CN108416438A
Application number: CN201810539816.XA
Authority: CN
Inventors: 王子彤; 姜凯; 聂林川
Original assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Current assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date: 2018-05-30
Filing date: 2018-05-30
Publication date: 2018-08-17

Abstract

The present invention discloses a kind of convolutional neural networks hardware module dispositions method, is related to convolutional neural networks and realizes field；Using the upper layer compiler of convolutional neural networks according to target hardware resource and the data scale of construction of convolutional neural networks model, simulation comparison is carried out to the way of realization of convolutional neural networks, the requirement of balancing hardware resource and convolutional neural networks speed, determine the deployment parameters of each hardware module of convolutional neural networks, it divides the quantity of each hardware module and determines the connection type between hardware module, to realize that convolutional neural networks hardware module is disposed.

Description

A kind of convolutional neural networks hardware module dispositions method

Technical field

The present invention discloses a kind of convolutional neural networks hardware module dispositions method, is related to convolutional neural networks and realizes field.

Background technology

Convolutional neural networks（CNN）It is multi-layer perception (MLP)（MLP）A mutation model.It develops from biological concept , visual cortex cell someways covers entire vision territory, just as some filters, their images to input It is local sensitivity, therefore can preferably excavates the spatial relationship information of the target in natural image.CNN is by reinforcing god Local connection mode through the intermediate node of adjacent layer in network is believed to excavate the space local association of targets of interest in natural image Breath.

The deployment way of CNN is mostly using X86-based CPU platform+GPU modes at present as hardware environment, real-time Such as TensorFlow, Caffe software frame is run in operating system, model parameter starts training and reasoning after setting Journey.In application end due to limitations such as resources, the smaller model of the scale of construction can only be often run, and model running is less efficient, identified Speed is horizontal far below the ends PC.The invention discloses a kind of convolutional neural networks hardware module dispositions methods, with hardware circuit Form realizes certain convolutional neural networks model, passes through combining target hardware resource and the neural network model scale of construction, trading-off resources With rate request, quantity distribution is carried out to hardware module and connection type divides, hardware logic can be made full use of to be provided with storage The conversion deployment between the realization deployment and different structure layer of each layer of convolutional neural networks is efficiently rapidly completed in source.It can be flexible It realizes various neural hardware configurations, and forward position can be researched and developed and be converted to hardware realization as early as possible, promote development efficiency.

Invention content

The present invention is directed to problem of the prior art, a kind of convolutional neural networks hardware module dispositions method is provided, using spy Determine constructional hardware resource, for different neural network characteristics, deployed in real time CNN to application end can greatly play neural network valence Value.

Concrete scheme proposed by the present invention is：

A kind of convolutional neural networks hardware module dispositions method：

The upper layer compiler of convolutional neural networks is according to target hardware resource and the data scale of construction of convolutional neural networks model, to volume The way of realization of product neural network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines volume The deployment parameters of each hardware module of product neural network divide the quantity of each hardware module and determine the connection between hardware module Mode, to realize that convolutional neural networks hardware module is disposed.

The hardware module of convolutional neural networks includes the basic module of convolutional neural networks and by convolution in the method The peculiar module that neural network model feature determines, wherein basic module include convolution module, pond module, accumulator module, number According to access module, instruction parsing distribution module.

According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines convolutional Neural net Convolution module basic unit size, the quantity of network, and whether be combined the basic unit of convolution module.

According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines pond module Basic unit size, quantity.

According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines in accumulator module Portion's tree, quantity, caching number.

According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net The way of realization of network carries out simulation comparison, and the requirement of balancing hardware resource and convolutional neural networks speed determines data access mould Block capacity, storage form.

According to target hardware resource and the data scale of construction of convolutional neural networks model in the method, to convolutional Neural net The way of realization of network carries out simulation comparison, the requirement of balancing hardware resource and convolutional neural networks speed, determine instruction parsing point Module and other module connection types and instruction execution feedback system are sent out, and the deployment parameters of hardware module are deposited with instruction type It is placed in instruction parsing distribution module, other hardware modules is given according to instruction execution transmission of feedback information.

Peculiar module includes expansion module, residual error module, compression module in the method.

Usefulness of the present invention is：

The present invention provides a kind of convolutional neural networks hardware module dispositions method, passes through combining target hardware resource and neural network The model scale of construction, trading-off resources and rate request carry out quantity distribution to hardware module and connection type divide, can make full use of Hardware logic and storage resource are efficiently rapidly completed between the realization deployment and different structure layer of each layer of convolutional neural networks Conversion deployment.It can flexibly realize various neural hardware configurations, and forward position can be researched and developed and be converted to hardware realization as early as possible, promotion is opened Send out efficiency.

Description of the drawings

Fig. 1 hardware modules of the present invention dispose relation schematic diagram；

Fig. 2 embodiment of the present invention disposes schematic diagram.

Specific implementation mode

The present invention provides a kind of convolutional neural networks hardware module dispositions method：

In conjunction with the drawings and the specific embodiments, the present invention will be further described.

Wherein in the present invention in each hardware module of convolutional neural networks, the basic module of convolutional neural networks includes but not It is limited to：Convolution module, pond module, accumulator module, data access module, instruction parsing distribution module etc. are respectively used to realize A step in convolutional neural networks or the operation of a few steps；

Include but not limited to by the peculiar module that the convolutional neural networks aspect of model determines：Expansion module, residual error module compress mould Block etc..

Each parameter of hardware module divides each mould by upper layer compiler according to target hardware resource and the neural network model scale of construction Number of blocks and connection type and the convolutional neural networks aspect of model determine convolution module basic unit size, quantity, if group It closes；Determine pond module base unit size, quantity；Determine accumulator module type of attachment, internal tree, quantity, caching Number；Determination data access module capacity, type of attachment and storage form；Determine that instruction parsing distribution module connects with other modules Connect the parameters such as mode and instruction execution feedback system；Above-mentioned parameter is stored in instruction parsing distribution module with instruction type, according to Instruction execution transmission of feedback information gives other each modules；

Each parameter of hardware module can optimize through manual amendment；Hardware module deployment can be changed by upper layer compiler at any time, to adapt to Different neural network models.

It is applied in practice, such as Fig. 2, according to target hardware resource is divided into 64 convolutional calculation basic units, each unit Convolution algorithm can be carried out to 16 row input feature vector figures and convolution kernel；

This layer network will complete the convolution of 256x256x64 to 256x256x128, non-linear, pondization calculates, by input feature vector figure By every 16 row, one component masses, 2 rows are repeated between adjacent 2 pieces, to ensure that block edge information is not lost；

Each convolution basic computational ele- ment is loaded into 4 convolution kernels, maximally utilizes target hardware resource, and first calculation block is 1. with preceding 4 The convolution results of a convolution kernel, having traversed 128 convolution kernel of whole needs 128/4=32 wheel to calculate, and completes all input channel blocks 1. Convolution and nonlinear operation；All pieces of input channel is traversed, whole picture input channel is completed and calculates；

Every 64 one group of feeding summing elements of intermediate result of 64 convolution basic units carry out input channel cumulative fortune parallel It calculates, as a result enters pond unit, data access module unloading is sent into behind pond；

It calculates needed for this layer after be fully completed, new round instruction is issued by instruction parsing distribution module, to each hardware module weight New deployment, each parameter all update, and are required with adapting to new one layer of convolutional neural networks calculating.

Claims

1. a kind of convolutional neural networks hardware module dispositions method, it is characterized in that

2. according to the method described in claim 1, it is characterized in that the hardware module of the convolutional neural networks includes convolutional Neural The basic module of network and the peculiar module determined by the convolutional neural networks aspect of model, wherein basic module include convolution mould Block, pond module, accumulator module, data access module, instruction parsing distribution module.

3. according to the method described in claim 2, it is characterized in that according to target hardware resource and the number of convolutional neural networks model According to the scale of construction, simulation comparison, balancing hardware resource and convolutional neural networks speed are carried out to the way of realization of convolutional neural networks It is required that determine convolution module basic unit size, the quantity of convolutional neural networks, and whether by the basic unit of convolution module It is combined.

4. according to the method in claim 2 or 3, it is characterized in that according to target hardware resource and convolutional neural networks model The data scale of construction carries out simulation comparison, balancing hardware resource and convolutional neural networks speed to the way of realization of convolutional neural networks Requirement, determine basic unit size, the quantity of pond module.

5. according to the method described in claim 4, it is characterized in that according to target hardware resource and the number of convolutional neural networks model According to the scale of construction, simulation comparison, balancing hardware resource and convolutional neural networks speed are carried out to the way of realization of convolutional neural networks It is required that determining tree inside accumulator module, quantity, caching number.

6. according to the method described in claim 2,3 or 5, it is characterized in that according to target hardware resource and convolutional neural networks model The data scale of construction, simulation comparison carried out to the way of realization of convolutional neural networks, balancing hardware resource and convolutional neural networks speed The requirement of degree determines data access module capacity, storage form.

7. according to the method described in claim 6, it is characterized in that according to target hardware resource and the number of convolutional neural networks model According to the scale of construction, simulation comparison, balancing hardware resource and convolutional neural networks speed are carried out to the way of realization of convolutional neural networks It is required that determine instruction parsing distribution module and other module connection types and instruction execution feedback system, and hardware module Deployment parameters are stored in instruction type in instruction parsing distribution module, other hardware are given according to instruction execution transmission of feedback information Module.

8. according to the method described in claim 7, it is characterized in that the peculiar module includes expansion module, residual error module is compressed Module.