CN109409512A

CN109409512A - A kind of neural computing unit, computing array and its construction method of flexibly configurable

Info

Publication number: CN109409512A
Application number: CN201811133940.2A
Authority: CN
Inventors: 任鹏举; 樊珑; 赵博然; 宗鹏陈; 陈飞; 郑南宁
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2019-03-01
Anticipated expiration: 2038-09-27
Also published as: CN109409512B

Abstract

The present invention discloses neural computing unit, computing array and its construction method of a kind of flexibly configurable, and neural computing unit includes: configurable memory module, configurable control module and can time-multiplexed multiply-add computing module；Configurable memory module includes: characteristic pattern data buffer storage buffer, step length data caching buffer and weight data caching buffer；Configurable control module includes: counter module and state machine module；Multiply-add computing module includes: multiplier and accumulator.The present invention can support any type of convolutional calculation, and support more size convolution kernel parallel computations, sufficiently excavate the flexibility and data reusability of convolutional neural networks computing unit, be greatly reduced by data-moving bring system power dissipation, improve the computational efficiency of system.

Description

A kind of neural computing unit, computing array and its building of flexibly configurable Method

Technical field

The invention belongs to neural network hardware framework field, in particular to the neural computing list of a kind of flexibly configurable Member, computing array and its construction method.

Background technique

Flexible hardware computing architecture has the hardware realization of convolutional neural networks important influence.Convolutional layer is made For structure most important in convolutional neural networks, have the characteristics that computationally intensive, data reusability is strong.Convolutional layer passes through weight This feature is shared, the complexity of network model is reduced, considerably reduces the quantity of parameter, avoids tional identification algorithm The feature extraction and data reconstruction processes of middle complexity.

In convolutional neural networks, the main function of convolutional layer is that same group of input feature vector diagram data and different outputs is logical One group of convolution kernel in road carries out convolution, then obtains quantity and the same number of output characteristic pattern of output channel, completes to feature The feature extraction of figure.As the demand that convolutional neural networks are constantly developed with study and to neural network gradually increases, mind Type through network model is more and more, and the depth of network is also gradually deepened, and the convolution mode of convolutional layer also becomes complicated and changeable.

Therefore, strong flexibility, calculated performance height and the neural computing unit structure that can be recycled are for convolutional layer Hardware realization be of great significance.The hardware realization of current most of convolutional layer computing units is all merely able to complete a type The convolution mode of type can not support the calculating in network model with different type convolutional layer, and can not make full use of volume The data reusing of lamination.

Summary of the invention

The purpose of the present invention is to provide neural computing unit, computing array and its buildings of a kind of flexibly configurable Method, this method can effectively enhance flexibility of the convolutional layer in hardware realization, improve the computational efficiency of system, give full play to The data reusing of convolutional layer, to reduce the power consumption of system to a certain extent and reduce the use of storage resource.

In order to achieve the above purpose, this programme uses the following technical solution:

A kind of neural computing unit of flexibly configurable, comprising: configurable memory module, configurable control module and It can time-multiplexed multiply-add computing module；

Configurable memory module includes: characteristic pattern data buffer storage buffer, step length data caching buffer and weight data Cache buffer；

Configurable control module includes: counter module and state machine module；

Multiply-add computing module includes: multiplier and accumulator.

Further, characteristic pattern data buffer storage buffer is used for the Partial Feature diagram data used when storing convolutional calculation, And the feature diagram data there are data sharing is recycled, the maximum length of buffer is L1, size max {K₁A₁, K₂A₂..., K_iA_i, wherein K is the size of convolution kernel in convolutional layer, and A is the input channel of computing unit domestic demand mapping Number, i are the serial number of convolutional layer in target network；

Step length data caching buffer is used to mention when convolution kernel slides and updates step length data to characteristic pattern caching buffer For the data for needing to update, the maximum length of buffer is L2, and size is max { S₁A₁, S₂A₂..., S_iA_i, wherein S is volume The step-length of convolution kernel in lamination；

Weight data caching buffer is used to store weight data and can reuse data, the length is L3, size are max { K₁A₁B₁, K₂A₂B₂..., K_iA_iB_i, wherein B is the output channel number of computing unit domestic demand mapping.

Further, it is counted in counter module comprising input data counter, input weight counter, output data Device, output channel counter and output characteristic pattern size counter；

In state machine module, there is corresponding characteristic pattern buffer state machine and weight for different convolution kernel sizes Buffer state machine, state machine carry out jumping for state according to the numerical value of counter each in counter module.

Further, neural computing unit is equipped with characteristic pattern data-in port and weight data input port；

The input terminal of characteristic pattern data-in port connection first selector；Two output ends of first selector connect respectively The input terminal of step length data caching buffer and the first input end of second selector are connect, step length data caches the output of buffer Second input terminal of end connection second selector, the input of the output end connection features diagram data caching buffer of second selector End；

The input terminal of weight data input port connection weight data buffer storage buffer；

The output end of output end and weight data the caching buffer of characteristic pattern data buffer storage buffer is separately connected multiplication Two input terminals of device；The output end of multiplier passes through register, accumulator, the 4th selector Connection Neural Network computing unit Output end.

Further, in state machine module, whens different convolution kernel sizes, all has corresponding characteristic pattern buffer state machine With weight buffer state machine, state machine carries out jumping for state according to the numerical value of counter each in counter module；

The state of characteristic pattern data buffer storage buffer includes: init state, data ready, wait state, Quan Xun Ring status updates data mode, half cycle state and non-recurrent state；

The state that weight data caches buffer includes: init state, data ready, wait state, complete alternation State and non-recurrent state.

Further, init state, the state are the reset condition for not having any data to enter computing unit；

Data ready, the state are to have the lazy weight for entering data into computing unit but input data to start It calculates；

Wait state, the state are when carrying out convolution algorithm parallel there are different size convolution kernels, to guarantee to export result It is less to be responsible for the lesser computing unit calculation amount of convolution kernel size for the synchronism of data, therefore it is larger to need to wait for convolution kernel size Computing unit；

Complete alternation state, the state are the data in the case that the data that buffer is currently exported reuse if it exists It can also complete to recycle back to the tail portion of buffer allocation space while entering multiply-add computing module；

Data mode is updated, which exists only in characteristic pattern data buffer storage buffer, and the data currently exported have been not required to In the case where reusing, which takes out from step length data caching buffer new while entering multiply-add computing module Data are input to the tail portion of the data cached buffer of characteristic pattern；

Half cycle state, the state exist only in characteristic pattern data buffer storage buffer and follow update data mode it Afterwards, the data currently exported can also be back to the previous position of more new data in buffer while entering multiply-add computing module It sets；

Not recurrent state, the state are the number in the case that the data that buffer is currently exported have not needed recycling According to multiply-add computing module is only entered, it is no longer restored to former buffer.

A kind of computing array is generated by the multiple configurable computing units of exampleization, carries out region division to computing array, no It can provide different convolution layer parameters with region, complete the parallel computation of variety classes convolution mode.

A kind of computing array is connected in a manner of row fixed data stream by the neural computing unit of flexibly configurable It generates；The scale of computing array is required to determine by the calculated performance of hardware resource, target network model and system；Computing array Width be K, the size of K should meet: greater than or equal to convolution kernel in network model full-size K_max, and be greater than or equal to There are the sum of convolution kernel sizes that parallel computation is needed when different size convolution kernels in same convolutional layer；The fundamental length of computing array For H, the size of H is the minimum dimension that all convolutional layers export in characteristic pattern in network model, and array physical length is according to specific Hardware resource is how many and system-computed performance requirement is with 2ⁿIt is extended for multiple；When there are different size convolution kernels in convolutional layer When needing parallel computation, it is assumed that convolution kernel size is respectively K₁、K₂、…、K_i, wherein existingComputing array is carried out It laterally divides, is divided into i region, each region scale is respectively K₁*H、K₂*H、…、K_i* H, different zones input different convolution class Shape parameter, the computing unit in each region voluntarily configure storage and computing module, complete the parallel computation of more size convolution kernels.

A kind of construction method of the neural computing unit of flexibly configurable, comprising the following steps:

Step 1: according to the model extraction network parameter of target network；

Step 2: it acts as storages in conjunction with the configurable memory module in step 1 design neural computing unit Partial Feature diagram data and weight data for calculating, comprising: characteristic pattern data buffer storage buffer, step length data caching Buffer and weight data cache buffer；

Step 3: in conjunction with the configurable control module in step 1 design neural computing unit, for different convolution Mode configurable control module configures different cache sizes to memory module, generates a variety of works when being respectively buffered in convolutional calculation Operation mode simultaneously controls caching and works in corresponding mode；Configurable control module structure includes: counter module and state machine mould Block；

Step 4: in conjunction with the multiply-add computing module in step 1 design neural computing unit, for calculating characteristic pattern Be multiplied with weight and added up to obtain convolution results part and, comprising: time-multiplexed multiplier and adder can be carried out；

Step 5: the configurable control in conjunction with Step 2: three and four, by external input port to neural computing unit Molding block provide convolution kernel size k, convolution kernel step-length s, output characteristic pattern size h, computing unit mapping input channel number a and Five kinds of convolution layer parameters of output channel number b；Configurable control module is that configurable memory module configures needed for this layer of convolutional calculation The spatial cache size wanted, and control configurable memory module and export corresponding data to multiply-add computing module；Different convolutional layers are logical It crosses and provides the part calculating that corresponding deconvolution parameter can complete convolution on same neural computing unit to computing unit.

Further, step 1 specifically: according to target network model, extract required parameter, comprising: each convolutional layer Convolution kernel size K_iAnd sliding step S_i, the output characteristic pattern size H of each convolutional layer_i, each convolutional layer computing unit domestic demand The input of mapping and output channel number A_iAnd B_i, wherein i is the number of plies of convolutional layer；

In step 2: characteristic pattern data buffer storage buffer is used to store partial pixel data used when convolutional calculation, And the pixel data there are data sharing is recycled, buffer length is max { K₁A₁, K₂A₂..., K_iA_i}；Step Long data buffer storage buffer is used to need to update to characteristic pattern caching buffer offer when convolution kernel slides and updates step length data Data, buffer length max { S₁A₁, S₂A₂..., S_iA_i}；Weight data caching buffer is for storing weight data simultaneously Data can be reused, buffer length is max { K₁A₁B₁, K₂A₂B₂..., K_iA_iB_i}；

In step 3: being counted in counter module comprising input data counter, input weight counter, output data Device, output channel counter and output characteristic pattern size counter；In state machine module, whens different convolution kernel sizes, all has Corresponding characteristic pattern buffer state machine and weight buffer state machine, state machine is according to the number of counter each in counter module Value carries out jumping for state, these states include: init state, data ready, wait state, complete alternation state, Update data mode, half cycle state and non-recurrent state.

Further, the different conditions in state machine determine the different working modes of buffer, specifically:

Init state, the state are the reset condition for not having any data to enter computing unit；

Further, step 4 specifically: multiply-add computing module includes multiplier and accumulator；Pass through time-multiplexed side Method, improves the working frequency of N times of multiplier and accumulator, and N number of neural computing unit can share a multiplier and add up Device；The cumulative quantity of accumulator is equal to the convolution kernel size of current convolutional layer.

Further, computing array is generated by the multiple configurable computing units of exampleization, region division is carried out to array, no It can provide different convolution layer parameters with region, complete the parallel computation of variety classes convolution mode.

Further, in step 5, external input port provides the convolution kernel size k of current convolutional layer, convolution kernel step-length S, characteristic pattern size h, five kinds of the input channel number a of computing unit internal maps, output channel number b input signals are exported；Control Module is each caching buffer configuration storage space size, and the length of characteristic pattern data buffer storage buffer is configured to k*a, step-length number It is configured to s*a according to the length of caching buffer, the length of weight data caching buffer is configured to k*a*b；Control module is each The upper limit value of counter configuration upper limit value size, input data counter and output data counter is configured to k*a, inputs weight The upper limit value of counter is configured to k*a*b, and the upper limit value of output channel counter is configured to b, and output characteristic pattern size counts The upper limit value of device is configured to h；After input counter is in upper limit value, computing unit starts to be calculated, and each output counts Device is counted accordingly and is controlled jumping for state machine；When each output counter is in upper limit value, indicate It completes once to the convolutional calculation of all or part of output channel of the convolutional layer.

Further, according to the calculated performance of hardware resource and system require, can the multiple configurable computing units of exampleization simultaneously Convolutional calculation array is generated by being connected with each other, the convolutional calculation of different type convolutional layer can be completed by the array；For portion In subnetwork model, in same convolutional layer there are two types of or two or more sizes convolution kernel, region can be carried out to array and drawn Point, different zones provide different deconvolution parameters, to guarantee that computing array all areas export the synchronism of result, pass through calculating The difference of different convolution kernel sizes can find out the time difference of produce output result between the computing unit of different zones, calculation amount compared with Computing unit in few region will wait calculation amount compared with the computing unit in multizone until the time difference is zero to start to count again It calculates, to guarantee the synchronism of array output result, completes the parallel computation of variety classes convolution mode.

Compared with the existing technology, the invention has the following advantages: the present invention discloses a kind of nerve of flexibly configurable Network query function unit, computing array and its construction method, the parameter needed when going out and design according to target network model extraction first, By the internal structure of parameter designing neural computing unit, requirement for the different convolution modes that external input provides should The control module of computing unit can configure some or all of corresponding Pattern completion convolution to storage and computing module and calculate.It is logical Cross example metaplasia at several configurable neural computing units and carry out arrangement produce complete convolutional calculation array, can Region division is carried out to array and different zones input different deconvolution parameters, the parallel of different type convolution mode can be completed It calculates.The present invention devises a kind of hardware structure of convolutional layer in convolutional neural networks, in the premise for guaranteeing system-computed performance Under, it can support the convolution mode of convolutional layer in heterogeneous networks model, greatly improve the flexibility of system, in computing unit Each operating mode of caching takes full advantage of the data reusing of convolutional neural networks, effectively reduces since data-moving is produced Raw system power dissipation, and the burden of storage aspect is alleviated to a certain extent, multiple computing unit composition computing arrays can It supports that different size convolution kernels are calculated parallel, has sufficiently excavated the algorithm degree of parallelism sum number of convolutional layer in convolutional neural networks According to reusability.

Detailed description of the invention

Fig. 1 is a kind of overall structure diagram of the neural computing unit of flexibly configurable of the present invention；

Fig. 2 is control module schematic diagram in a kind of neural computing unit of flexibly configurable of the present invention；

Fig. 3 is the state machine diagram of characteristic pattern data buffer storage buffer in control module；

Fig. 4 is that the multiple computing units of exampleization of the present invention generate computing array schematic diagram.

Specific embodiment

The present invention is further described in detail with reference to the accompanying drawing,

Refering to Figure 1, a kind of neural computing unit of flexibly configurable of the present invention, comprising: configurable storage Module, configurable control module and can time-multiplexed multiply-add computing module；Configurable memory module includes: that feature diagram data is slow Deposit buffer, step length data caching buffer and weight data caching buffer；Configurable control module includes: counter module And state machine module；Multiply-add computing module includes: multiplier and accumulator.

Characteristic pattern data buffer storage buffer is used for the Partial Feature diagram data used when storing convolutional calculation, and to there are numbers It is recycled according to shared feature diagram data, the maximum length of buffer is L1, and size is max { K₁A₁, K₂A₂..., K_iA_i, wherein K is the size of convolution kernel in convolutional layer, and A is the input channel number of computing unit domestic demand mapping, and i is target network The serial number of middle convolutional layer；Step length data caches buffer and is used to cache when convolution kernel slides and updates step length data to characteristic pattern Buffer provides the data for needing to update, and the maximum length of buffer is L2, and size is max { S₁A₁, S₂A₂..., S_iA_i, Middle S is the step-length of convolution kernel in convolutional layer；Weight data caching buffer is for storing weight data and can carry out to data Recycling, the length is L3, size is max { K₁A₁B₁, K₂A₂B₂..., K_iA_iB_i, wherein B is the mapping of computing unit domestic demand Output channel number.

Neural computing unit is equipped with characteristic pattern data-in port and weight data input port；Feature diagram data is defeated The input terminal of inbound port connection first selector 1；Two output ends of first selector 1 are separately connected step length data caching The input terminal of buffer and the first input end of second selector 2, step length data cache the second choosing of output end connection of buffer Select the second input terminal of device 2, the input terminal of the output end connection features diagram data caching buffer of second selector 2；Weight number According to the input terminal of input port connection weight data buffer storage buffer；The output end and weight number of characteristic pattern data buffer storage buffer Two input terminals of multiplier are separately connected according to the output end of caching buffer；The output end of multiplier passes through register, adds up The output end of device, 4 Connection Neural Network computing unit of the 4th selector.The output end of characteristic pattern data buffer storage buffer also passes through Third selector connector input terminal.

The internal structure that Fig. 2 show buffer control module is please referred to, mainly by counter module and state machine module It constitutes；Include input data counter, input weight counter, output data counter, output channel number in counter module Counter and output characteristic pattern size counter；In state machine module, all there is corresponding feature for different convolution kernel sizes Figure buffer state machine and weight buffer state machine, state machine carry out shape according to the numerical value of counter each in counter module State jumps.

A kind of state machine diagram that Fig. 3 show characteristic pattern data buffer storage buffer in control module is please referred to, these State include: init state S0, data ready S1, wait state S6, complete alternation state S2, update data mode S3, Half cycle state S4 and non-recurrent state S5, different conditions determine the different operation mode of the buffer；External input to calculate Array control signal to buffer control module provide convolution kernel size, convolution window sliding step, export characteristic pattern size, Array Mapping outputs and inputs five kinds of information of port number, and then the corresponding upper limit value of each counter can be obtained, and passes through counter Numerical value come control corresponding convolution kernel size state machine carry out state transition, completion respectively stored under different convolution kernel sizes The operating of buffer.The state of weight data caching buffer includes: init state, data ready, wait state, complete Recurrent state and non-recurrent state.

Different conditions for please referring to shown in Fig. 4 in state machine determine the different working modes of buffer, specifically:

Not recurrent state, the state are the number in the case that the data that buffer is currently exported have not needed recycling According to multiply-add computing module is only entered, it is no longer restored to former buffer.Change multiple computing units phase in a manner of row fixed data stream Connection generates computing array, the scale of computing array by the calculated performance of hardware resource, target network model and system require Lai It determines；The width of computing array is K, the size of K should meet: greater than or equal to convolution kernel in network model full-size K_max, And more than or equal to there are the sum of convolution kernel sizes for needing parallel computation when different size convolution kernels in same convolutional layer；Calculate battle array The fundamental length of column is H, and the size of H is the minimum dimension that all convolutional layers export in characteristic pattern in network model, and array is practical Length can how many according to particular hardware resource and system-computed performance requirement is with 2ⁿIt is extended for multiple；When existing in convolutional layer When different size convolution kernels need parallel computation, it is assumed that convolution kernel size is respectively K₁、K₂、…、K_i, wherein existing Lateral division is carried out to computing array, is divided into i region, each region scale is respectively K₁*H、K₂*H、…、K_i* H, different zones Different convolution type parameters are inputted, the computing unit in each region voluntarily configures storage and computing module, completes more size convolution The parallel computation of core.

A kind of construction method of the neural computing unit of flexibly configurable of the present invention, comprising the following steps:

The present invention gives neural computing unit, computing array and the construction methods of a kind of flexibly configurable, different The convolution mode of type only need to provide part deconvolution parameter to computing unit, and computing unit can voluntarily configure storage inside and meter Module is calculated, the convolutional calculation of corresponding modes can be completed to computing unit input feature vector figure and weight data, is greatlyd improve Flexibility when convolutional layer passes through hardware realization in convolutional neural networks, it is real convenient for rapid deployment of the convolutional layer on hardware It is existing.

In the present invention, external input provides convolution layer parameter to the control module of computing unit, and control module is empty to storage Between carry out reasonable disposition and control memory module and work in corresponding modes；Respectively caching is respectively provided with a variety of Working moulds in memory module Formula, there are shared data can be maximized utilization；Different convolutional layers can built by configurable computing unit it is same It is completed on a computing array, the operation that different size convolution kernel parallel-convolutions calculate can be completed on array by dividing region； The present invention can support any type of convolutional calculation, and support more size convolution kernel parallel computations, sufficiently excavate convolutional Neural net The flexibility and data reusability of network computing unit, are greatly reduced by data-moving bring system power dissipation, improve the meter of system Calculate efficiency.

Claims

1. a kind of neural computing unit of flexibly configurable characterized by comprising configurable memory module can configure Control module and can time-multiplexed multiply-add computing module；

Configurable memory module includes: characteristic pattern data buffer storage buffer, step length data caching buffer and weight data caching buffer；

Multiply-add computing module includes: multiplier and accumulator.

2. a kind of neural computing unit of flexibly configurable according to claim 1, which is characterized in that characteristic pattern number The Partial Feature diagram data used when being used for according to caching buffer and store convolutional calculation, and to there are the characteristic pattern numbers of data sharing According to being recycled, the maximum length of buffer is L1, and size is max { K₁A₁, K₂A₂..., K_iA_i, wherein K is convolution The size of convolution kernel in layer, A are the input channel number of computing unit domestic demand mapping, and i is the serial number of convolutional layer in target network；

Step length data caches buffer need to for providing when convolution kernel slides and updates step length data to characteristic pattern caching buffer The data to be updated, the maximum length of buffer are L2, and size is max { S₁A₁, S₂A₂..., S_iA_i, wherein S is convolutional layer The step-length of middle convolution kernel；

Weight data caching buffer is for storing weight data and can reuse to data, and the length is L3, greatly Small is max { K₁A₁B₁, K₂A₂B₂..., K_iA_iB_i, wherein B is the output channel number of computing unit domestic demand mapping.

3. a kind of neural computing unit of flexibly configurable according to claim 1, which is characterized in that counter mould It is special comprising input data counter, input weight counter, output data counter, output channel counter and output in block Levy figure size counter；

In state machine module, there is corresponding characteristic pattern buffer state machine and weight buffer shape for different convolution kernel sizes State machine, state machine carry out jumping for state according to the numerical value of counter each in counter module.

4. a kind of neural computing unit of flexibly configurable according to claim 1, which is characterized in that convolutional calculation Unit is equipped with characteristic pattern data-in port and weight data input port；

The input terminal of characteristic pattern data-in port connection first selector；Two output ends of first selector are separately connected step The input terminal of long data buffer storage buffer and the first input end of second selector, the output end that step length data caches buffer connect Connect the second input terminal of second selector, the input terminal of the output end connection features diagram data caching buffer of second selector；

The output end of output end and weight data the caching buffer of characteristic pattern data buffer storage buffer is separately connected multiplier Two input terminals；The output end of multiplier by register, accumulator, the 4th selector Connection Neural Network computing unit it is defeated Outlet.

5. a kind of neural computing unit of flexibly configurable according to claim 3, which is characterized in that state machine mould In block, whens different convolution kernel sizes, all has corresponding characteristic pattern buffer state machine and weight buffer state machine, state machine Jumping for state is carried out according to the numerical value of counter each in counter module；

Characteristic pattern according to caching buffer state include: init state, data ready, wait state, complete alternation state, Update data mode, half cycle state and non-recurrent state；

The state that weight data caches buffer includes: init state, data ready, wait state, complete alternation state And not recurrent state.

6. a kind of neural computing unit of flexibly configurable according to claim 5, which is characterized in that

Data ready, the state are to have the lazy weight for entering data into computing unit but input data in terms of starting It calculates；

Wait state, the state are when carrying out convolution algorithm parallel there are different size convolution kernels, to guarantee to export result data Synchronism, it is less to be responsible for the lesser computing unit calculation amount of convolution kernel size, therefore needs to wait for the larger-size meter of convolution kernel Calculate unit；

Complete alternation state, the state be in the case that the data that currently export of buffer reuse if it exists, the data into It can also complete to recycle back to the tail portion of buffer allocation space while entering multiply-add computing module；

Data mode is updated, which exists only in characteristic pattern data buffer storage buffer, and the data currently exported have not needed weight In the case where multiple utilization, which takes out new data from step length data caching buffer while entering multiply-add computing module It is input to the tail portion of the data cached buffer of characteristic pattern；

Half cycle state, the state exist only in characteristic pattern data buffer storage buffer and follow after updating data mode, when The data of preceding output can also be back to the prior location of more new data in buffer while entering multiply-add computing module；

Not recurrent state, the state are in the case that the data that buffer is currently exported have not needed recycling, and the data are only Into multiply-add computing module, it is no longer restored to former buffer.

7. a kind of computing array, which is characterized in that pass through the nerve of flexibly configurable described in any one of claims 1 to 6 Network query function unit generates, and carries out region division to computing array, and different zones can provide different convolution layer parameters, complete not With the parallel computation of type convolution mode.

8. a kind of computing array according to claim 7, which is characterized in that by described in any one of claims 1 to 6 The neural computing unit of flexibly configurable is connected generation in a manner of row fixed data stream；The scale of computing array is by hard The calculated performance of part resource, target network model and system requires to determine；The width of computing array is K, and the size of K should expire Foot: greater than or equal to convolution kernel in network model full-size K_max, and more than or equal to there are different rulers in same convolutional layer The sum of the convolution kernel size of parallel computation is needed when very little convolution kernel；The fundamental length of computing array is H, and the size of H is network model In minimum dimension in all convolutional layers output characteristic patterns, array physical length is according to particular hardware resource is how many and system-computed Performance requirement is with 2ⁿIt is extended for multiple；When needing parallel computation there are different size convolution kernels in convolutional layer, it is assumed that convolution Core size is respectively K₁、K₂、…、K_i, wherein existingLateral division is carried out to computing array, is divided into i region, Each region scale is respectively K₁*H、K₂*H、…、K_i* H, different zones input different convolution type parameters, the calculating in each region Unit voluntarily configures storage and computing module, completes the parallel computation of more size convolution kernels.

9. a kind of construction method of the neural computing unit of flexibly configurable, which comprises the following steps:

Step 2: it acts as storages to be used in conjunction with the configurable memory module in step 1 design neural computing unit The Partial Feature diagram data and weight data of calculating, comprising: characteristic pattern data buffer storage buffer, step length data caching buffer and Weight data caches buffer；

Step 3: in conjunction with the configurable control module in step 1 design neural computing unit, for different convolution modes Configurable control module configures different cache sizes to memory module, generates a variety of Working moulds when being respectively buffered in convolutional calculation Formula simultaneously controls caching and works in corresponding mode；Configurable control module structure includes: counter module and state machine module；

Step 4: in conjunction with the multiply-add computing module in step 1 design neural computing unit, for calculating characteristic pattern and power Value be multiplied and added up to obtain convolution results part and, comprising: time-multiplexed multiplier and adder can be carried out；

Step 5: in conjunction with Step 2: three and four, by external input port to the configurable control mould of neural computing unit Block provides convolution kernel size k, convolution kernel step-length s, output characteristic pattern size h, the input channel number a of computing unit mapping and output Five kinds of convolution layer parameters of port number b；Configurable control module is that configurable memory module configures required for this layer of convolutional calculation Spatial cache size, and control configurable memory module and export corresponding data to multiply-add computing module；Different convolutional layers pass through to Computing unit provides the part calculating that corresponding deconvolution parameter can complete convolution on same neural computing unit.

10. a kind of construction method of the neural computing unit of flexibly configurable according to claim 9, feature exist In,

Step 1 specifically: according to target network model, extract required parameter, comprising: the convolution kernel size K of each convolutional layer_i And sliding step S_i, the output characteristic pattern size H of each convolutional layer_i, the input of each convolutional layer computing unit domestic demand mapping and defeated Port number A out_iAnd B_i, wherein i is the number of plies of convolutional layer；

In step 2: characteristic pattern data buffer storage buffer is used to store partial pixel data used when convolutional calculation, and right There are the pixel datas of data sharing to be recycled, and buffer length is max { K₁A₁, K₂A₂..., K_iA_i}；Step-length number It is used to provide the number for needing to update to characteristic pattern caching buffer when convolution kernel slides and updates step length data according to caching buffer According to buffer length max { S₁A₁, S₂A₂..., S_iA_i}；Weight data caching buffer is used to store weight data and can Data are reused, buffer length is max { K₁A₁B₁, K₂A₂B₂..., K_iA_iB_i}；

In step 3: comprising input data counter, input weight counter, output data counter, defeated in counter module Channel counter and output characteristic pattern size counter out；In state machine module, whens different convolution kernel sizes, all has corresponding Characteristic pattern buffer state machine and weight buffer state machine, state machine according to the numerical value of counter each in counter module come Carry out state jumps, these states include: init state, data ready, wait state, complete alternation state, update Data mode, half cycle state and non-recurrent state.