CN109800857A

CN109800857A - A kind of cavity convolution acceleration system and its method

Info

Publication number: CN109800857A
Application number: CN201811573074.9A
Authority: CN
Inventors: 孔文海
Original assignee: Zhuhai Wisdom Electronic Technology Co Ltd
Current assignee: Zhuhai Wisdom Electronic Technology Co Ltd
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2019-05-24

Abstract

The present invention provides a kind of empty convolution acceleration systems, comprising: preparation module, parameter setting module, generation module, grouping module, computing module and accumulator module.A kind of cavity convolution accelerated method, comprising: S1, input original convolution and the convolution kernel used to system；S2, cavity value and filler are inputted to system；S3, empty convolution is converted by empty value and filler by original convolution；S4, empty convolution is grouped；S5, operation is carried out to the convolution being grouped；S6, the operation result for summarizing each grouping export final result in turn.The utilization rate of data and multiplier can be improved in the present invention in spatial convoluted calculating process, to effectively promote the operation efficiency of total system.

Description

A kind of cavity convolution acceleration system and its method

Technical field

The present invention relates to a kind of empty convolution acceleration system and its methods, belong to deep neural network field.

Background technique

The development of machine learning in recent years is very fast, and the trial in many fields shows good performance；And With the intensification of network, corresponding operand is also increased with it；In this background, the accelerator of deep neural network meet the tendency of and Raw, the present invention is to a kind of method for carrying out operation acceleration in deep neural network accelerator for empty convolution.

Empty convolution solves the problems, such as that receptive field expands in deep neural network, is mainly used in target detection, divides Cut field.Comparing a successful application case is SSD, i.e. Single Shot MultiBox Detector, in the case may be used To realize the effect of end-to-end direct prediction in network, regressive object bounding box.The complexity of this kind of calculation method is lower, in letter Speed is effectively improved on the basis of change target detection process.And typical convolutional network calculation be by Matrix with Matrix or Vector, which is multiplied, completes convolution algorithm, and the preparation of data mainly comes for example, by the im2col in Caffe engineering It realizes.When Matrix is multiplied with Matrix, need to consider that ranks size, ranks value determine required number of multipliers.But The difference of empty convolution and common convolution be data required for same secondary convolutional calculation be it is discontinuous, due to empty convolution Some invalid empty values are interted on the original basis, the utilization rate that this allows for traditional multiplication is lower, how effectively to be promoted The operation efficiency of empty convolution is the technical issues that need to address instantly.

Summary of the invention

To solve the above problems, the system comprises the following modules the present invention provides a kind of empty convolution acceleration system:

Preparation module, convolution kernel for inputting original convolution and using to system；

Parameter setting module, for inputting empty value and filler to system；

Generation module, for converting empty convolution by empty value and filler for original convolution；

Grouping module, for being grouped to empty convolution；

Computing module, for carrying out operation to the convolution being grouped；

Accumulator module, for summarizing the operation result of each grouping and then exporting final result.

Further, the convolution kernel is the identical matrix area of row and column numerical value.

Further, the convolution kernel includes 2 × 2,3 × 3,4 × 4 and 5 × 5 matrix.

Further, cavity value and filler must be equal.

Further, zero is filled into original convolution according to filler.

Further, the zero of filling only will appear the first row and last column or the first row and last line of matrix.

It further, then will include zero when zero occurs in the row or column in pre- packet zone when being grouped to empty convolution The region of value is individually grouped.

Further, when carrying out operation to the region being grouped, same convolution kernel can be used, difference can also be applied in combination Convolution kernel.

A kind of cavity convolution accelerated method, method includes the following steps:

S1, original convolution and the convolution kernel used are inputted to system；

S2, cavity value and filler are inputted to system；

S3, empty convolution is converted by empty value and filler by original convolution；

S4, empty convolution is grouped；

S5, operation is carried out to the convolution being grouped；

S6, the operation result for summarizing each grouping export final result in turn.

The beneficial effects of the present invention are: the utilization rate of data and multiplier can be improved in spatial convoluted calculating process, To effectively promote the operation efficiency of total system.

Detailed description of the invention

Fig. 1 is overall structure figure according to the present invention；

Fig. 2 is overview flow chart according to the present invention；

Fig. 3 is the schematic diagram of specific embodiment A according to the present invention；

Fig. 4 is the schematic diagram of specific embodiment B according to the present invention；

Fig. 5 is the schematic diagram of specific embodiment C according to the present invention；

Fig. 6 is the schematic diagram of specific embodiment D according to the present invention；

Fig. 7 is the schematic diagram of specific embodiment E according to the present invention.

Specific embodiment

It is carried out below with reference to technical effect of the embodiment and attached drawing to design of the invention, specific structure and generation clear Chu, complete description, to be completely understood by the purpose of the present invention, scheme and effect.

It should be noted that unless otherwise specified, when a certain feature referred to as " fixation ", " connection " are in another feature, It can directly fix, be connected to another feature, and can also fix, be connected to another feature indirectly.In addition, this The descriptions such as the upper and lower, left and right used in open are only the mutual alignment pass relative to each component part of the disclosure in attached drawing For system.The "an" of used singular, " described " and "the" are also intended to including most forms in the disclosure, are removed Non- context clearly expresses other meaning.In addition, unless otherwise defined, all technical and scientific terms used herein It is identical as the normally understood meaning of those skilled in the art.Term used in the description is intended merely to describe herein Specific embodiment is not intended to be limiting of the invention.Term as used herein "and/or" includes one or more relevant The arbitrary combination of listed item.

It will be appreciated that though various elements, but this may be described using term first, second, third, etc. in the disclosure A little elements should not necessarily be limited by these terms.These terms are only used to for same type of element being distinguished from each other out.For example, not departing from In the case where disclosure range, first element can also be referred to as second element, and similarly, second element can also be referred to as One element.The use of provided in this article any and all example or exemplary language (" such as ", " such as ") is intended merely to more Illustrate the embodiment of the present invention well, and unless the context requires otherwise, otherwise the scope of the present invention will not be applied and be limited.

It should be appreciated that the embodiment of the present invention can be by computer hardware, the combination of hardware and software or by depositing The computer instruction in non-transitory computer-readable memory is stored up to be effected or carried out.Standard volume can be used in the method Journey technology-includes that the non-transitory computer-readable storage media configured with computer program is realized in computer program, In configured in this way storage medium computer is operated in a manner of specific and is predefined --- according in a particular embodiment The method and attached drawing of description.Each program can with the programming language of level process or object-oriented come realize with department of computer science System communication.However, if desired, the program can be realized with compilation or machine language.Under any circumstance, which can be volume The language translated or explained.In addition, the program can be run on the specific integrated circuit of programming for this purpose.

In addition, the operation of process described herein can be performed in any suitable order, unless herein in addition instruction or Otherwise significantly with contradicted by context.Process described herein (or modification and/or combination thereof) can be held being configured with It executes, and is can be used as jointly on the one or more processors under the control of one or more computer systems of row instruction The code (for example, executable instruction, one or more computer program or one or more application) of execution, by hardware or its group It closes to realize.The computer program includes the multiple instruction that can be performed by one or more processors.

Further, the method can be realized in being operably coupled to suitable any kind of computing platform, wrap Include but be not limited to PC, mini-computer, main frame, work station, network or distributed computing environment, individual or integrated Computer platform or communicated with charged particle tool or other imaging devices etc..Each aspect of the present invention can be to deposit The machine readable code on non-transitory storage medium or equipment is stored up to realize no matter be moveable or be integrated to calculating Platform, such as hard disk, optical reading and/or write-in storage medium, RAM, ROM, so that it can be read by programmable calculator, when Storage medium or equipment can be used for configuration and operation computer to execute process described herein when being read by computer.This Outside, machine readable code, or part thereof can be transmitted by wired or wireless network.When such media include combining microprocessor Or other data processors realize steps described above instruction or program when, invention as described herein including these and other not The non-transitory computer-readable storage media of same type.When methods and techniques according to the present invention programming, the present invention It further include computer itself.

Computer program can be applied to input data to execute function as described herein, to convert input data with life At storing to the output data of nonvolatile memory.Output information can also be applied to one or more output equipments as shown Device.In the preferred embodiment of the invention, the data of conversion indicate physics and tangible object, including the object generated on display Reason and the particular visual of physical objects are described.

It should be noted that being lifted during convolution algorithm in embodiment in order to facilitate the understanding to the technical program Example be N × 3 matrix, i.e. line number is unlimited, columns be 3 matrix；But this is not to the technical program application range Limitation, actually applicable row matrix columns is without limitation.

It show overall structure figure according to the present invention referring to Fig.1, comprises the following modules:

Preparation module, convolution kernel for inputting original convolution and using to system；The convolution kernel is row and column numerical value Identical matrix area；The convolution kernel includes 2 × 2,3 × 3,4 × 4 and 5 × 5 matrix, optimally, generally uses 3 × 3 Convolution kernel；

Parameter setting module, for inputting empty value and filler to system；Cavity value and filler must be equal；It needs Illustrate, the numerical value in original matrix region is in input cavity value, by being converted into the numerical value in empty convolution multiplied by empty value； Filler refers to being packed into the zero quantity in original convolution；

Generation module, for converting empty convolution by empty value and filler for original convolution；According to filler to Zero is filled in original convolution；The zero of filling only will appear the first row and last column or the first row and last of matrix A line；

Grouping module, for being grouped to empty convolution；When being grouped to empty convolution, when in pre- packet zone There is zero in row or column, then is individually grouped in the region comprising zero；The grouping in other regions is carried out by the demand actually taken a little；

Computing module, for carrying out operation to the convolution being grouped；When carrying out operation to the region being grouped, it can make With same convolution kernel, different convolution kernels can also be applied in combination；

Overview flow chart according to the present invention is shown referring to Fig. 2, comprising the following steps:

S1, original convolution and the convolution kernel used are inputted to system；The convolution kernel is the identical matrix of row and column numerical value Region；Including 2 × 2,3 × 3,4 × 4 and 5 × 5 matrix；

S2, cavity value and filler are inputted to system；Cavity value and filler must be equal；

S3, empty convolution is converted by empty value and filler by original convolution；According to filler into original convolution Fill zero；The zero of filling only will appear the first row and last column or the first row and last line of matrix；

S4, empty convolution is grouped；When being grouped to empty convolution, when the row or column in pre- packet zone occurs Region comprising zero is then individually grouped by zero；

S5, operation is carried out to the convolution being grouped；When carrying out operation to the region being grouped, same convolution can be used Different convolution kernels can also be applied in combination in core；

The schematic diagram of specific embodiment A according to the present invention is shown referring to Fig. 3, what embodiment A illustrated is empty convolution sum The matrix multiplication of non-cavity convolution in the row direction is grouped example, and what left side table indicated in schematic diagram is original convolution, i.e., not Insertion cavity value；What intermediate table indicated is the variation of the matrix when the empty value of input is 6, and the first column data is constant, second Column data is 6 times of initial data, and third column data is similarly；What right side table indicated is phase when using 3 × 3 filter The construction for answering convolution kernel, i.e., 3 × 3 convolution kernels hereinbefore referred to.

Show the schematic diagram of specific embodiment B according to the present invention referring to Fig. 4, the data of data matrix with arrange for direction into Row scanning.

In SSD-VGG300 network, if the size of empty convolved data input is 19x19x512, that is, need to choose 19 A Effective Numerical or data point, the traditional approach being shown respectively under SSD-VGG300 network referring to figure 5 and figure 6 and grouping The calculating process of mode；Effectively value or data point are 0-18 in traditional approach, are effectively worth in packet mode or data point is 6 To 24；Assuming that the multiplication that max calculation provides 12 × 3 × n calculates power, data RAM at most takes 14 groups of data.

When carrying out operation using traditional approach, 10 calculating is needed, i.e., can just take 19 enough required significant figures 10 times Value or data point, the calculation amount of single are 2*3*n.

And when converting empty convolution for numerical value, empty value and filler are 6, and the X in table indicates no data herein, The zero hereinbefore referred to；When occurring meta-data location in row, uniline is down to two dimension from three-dimensional data；Packet mode is according to preceding Mentioned by text, the region for zero occur individually is divided into one group；And 6 calculating is only needed after being grouped, the single most matter of fundamental importance Calculation amount is 6 × 2*n；Packet mode improves 40% efficiency compared to traditional approach.

The schematic diagram of specific embodiment D according to the present invention is shown referring to Fig. 7, embodiment D illustrates, in SSD- In VGG200 network, the data input size of empty convolution is 12 × 12 × 512, that is, needs to choose 19 Effective Numericals or data Point, convolution cavity value and Filling power are 6, and the X in table indicates that no data, uniline are down to two dimension from three-dimensional herein；It is false simultaneously If the multiplication that max calculation provides 12 × 3 × n calculates power, data RAM at most takes 14 groups of data；According to this method, data will divide It is two groups, which is only needed once to calculate the multiplying that 12 × 2 × n can be completed, multiplier utilization rate reaches 66.7%；It is needed if using traditional approach 6 times, single utilization rate is only 1/6th；The former utilization rate conventional method 4 times, operational performance is 6 times of conventional method, promoted effect highly significant.

The above, only presently preferred embodiments of the present invention, the invention is not limited to above embodiment, as long as It reaches technical effect of the invention with identical means, all within the spirits and principles of the present invention, any modification for being made, Equivalent replacement, improvement etc., should be included within the scope of the present invention.Its technical solution within the scope of the present invention And/or embodiment can have a variety of different modifications and variations.

Claims

1. a kind of cavity convolution acceleration system, which is characterized in that the system comprises the following modules:

Parameter setting module, for inputting empty value and filler to system；

Grouping module, for being grouped to empty convolution；

2. cavity convolution acceleration system according to claim 1, which is characterized in that the convolution kernel is row and column numerical value phase Same matrix area.

3. cavity convolution acceleration system according to claim 2, which is characterized in that the convolution kernel includes 2 × 2,3 × 3, 4 × 4 and 5 × 5 matrix.

4. cavity convolution acceleration system according to claim 1, which is characterized in that cavity value and filler must be equal.

5. cavity convolution acceleration system according to claim 1, which is characterized in that filled out according to filler into original convolution Zeroize value.

6. cavity convolution acceleration system according to claim 5, which is characterized in that the zero of filling only will appear matrix First row and last column or the first row and last line.

7. cavity convolution acceleration system according to claim 1, which is characterized in that when being grouped to empty convolution, when There is zero in row or column in pre- packet zone, then is individually grouped in the region comprising zero.

8. cavity convolution acceleration system according to claim 1, which is characterized in that carry out operation to the region being grouped When, same convolution kernel can be used, different convolution kernels can also be applied in combination.

9. a kind of cavity convolution accelerated method, which is characterized in that method includes the following steps:

S2, cavity value and filler are inputted to system；

S4, empty convolution is grouped；

S5, operation is carried out to the convolution being grouped；