CN109800857A - A kind of cavity convolution acceleration system and its method - Google Patents

A kind of cavity convolution acceleration system and its method Download PDF

Info

Publication number
CN109800857A
CN109800857A CN201811573074.9A CN201811573074A CN109800857A CN 109800857 A CN109800857 A CN 109800857A CN 201811573074 A CN201811573074 A CN 201811573074A CN 109800857 A CN109800857 A CN 109800857A
Authority
CN
China
Prior art keywords
convolution
empty
cavity
grouped
filler
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811573074.9A
Other languages
Chinese (zh)
Inventor
孔文海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Wisdom Electronic Technology Co Ltd
Original Assignee
Zhuhai Wisdom Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Wisdom Electronic Technology Co Ltd filed Critical Zhuhai Wisdom Electronic Technology Co Ltd
Priority to CN201811573074.9A priority Critical patent/CN109800857A/en
Publication of CN109800857A publication Critical patent/CN109800857A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention provides a kind of empty convolution acceleration systems, comprising: preparation module, parameter setting module, generation module, grouping module, computing module and accumulator module.A kind of cavity convolution accelerated method, comprising: S1, input original convolution and the convolution kernel used to system;S2, cavity value and filler are inputted to system;S3, empty convolution is converted by empty value and filler by original convolution;S4, empty convolution is grouped;S5, operation is carried out to the convolution being grouped;S6, the operation result for summarizing each grouping export final result in turn.The utilization rate of data and multiplier can be improved in the present invention in spatial convoluted calculating process, to effectively promote the operation efficiency of total system.

Description

A kind of cavity convolution acceleration system and its method
Technical field
The present invention relates to a kind of empty convolution acceleration system and its methods, belong to deep neural network field.
Background technique
The development of machine learning in recent years is very fast, and the trial in many fields shows good performance;And With the intensification of network, corresponding operand is also increased with it;In this background, the accelerator of deep neural network meet the tendency of and Raw, the present invention is to a kind of method for carrying out operation acceleration in deep neural network accelerator for empty convolution.
Empty convolution solves the problems, such as that receptive field expands in deep neural network, is mainly used in target detection, divides Cut field.Comparing a successful application case is SSD, i.e. Single Shot MultiBox Detector, in the case may be used To realize the effect of end-to-end direct prediction in network, regressive object bounding box.The complexity of this kind of calculation method is lower, in letter Speed is effectively improved on the basis of change target detection process.And typical convolutional network calculation be by Matrix with Matrix or Vector, which is multiplied, completes convolution algorithm, and the preparation of data mainly comes for example, by the im2col in Caffe engineering It realizes.When Matrix is multiplied with Matrix, need to consider that ranks size, ranks value determine required number of multipliers.But The difference of empty convolution and common convolution be data required for same secondary convolutional calculation be it is discontinuous, due to empty convolution Some invalid empty values are interted on the original basis, the utilization rate that this allows for traditional multiplication is lower, how effectively to be promoted The operation efficiency of empty convolution is the technical issues that need to address instantly.
Summary of the invention
To solve the above problems, the system comprises the following modules the present invention provides a kind of empty convolution acceleration system:
Preparation module, convolution kernel for inputting original convolution and using to system;
Parameter setting module, for inputting empty value and filler to system;
Generation module, for converting empty convolution by empty value and filler for original convolution;
Grouping module, for being grouped to empty convolution;
Computing module, for carrying out operation to the convolution being grouped;
Accumulator module, for summarizing the operation result of each grouping and then exporting final result.
Further, the convolution kernel is the identical matrix area of row and column numerical value.
Further, the convolution kernel includes 2 × 2,3 × 3,4 × 4 and 5 × 5 matrix.
Further, cavity value and filler must be equal.
Further, zero is filled into original convolution according to filler.
Further, the zero of filling only will appear the first row and last column or the first row and last line of matrix.
It further, then will include zero when zero occurs in the row or column in pre- packet zone when being grouped to empty convolution The region of value is individually grouped.
Further, when carrying out operation to the region being grouped, same convolution kernel can be used, difference can also be applied in combination Convolution kernel.
A kind of cavity convolution accelerated method, method includes the following steps:
S1, original convolution and the convolution kernel used are inputted to system;
S2, cavity value and filler are inputted to system;
S3, empty convolution is converted by empty value and filler by original convolution;
S4, empty convolution is grouped;
S5, operation is carried out to the convolution being grouped;
S6, the operation result for summarizing each grouping export final result in turn.
The beneficial effects of the present invention are: the utilization rate of data and multiplier can be improved in spatial convoluted calculating process, To effectively promote the operation efficiency of total system.
Detailed description of the invention
Fig. 1 is overall structure figure according to the present invention;
Fig. 2 is overview flow chart according to the present invention;
Fig. 3 is the schematic diagram of specific embodiment A according to the present invention;
Fig. 4 is the schematic diagram of specific embodiment B according to the present invention;
Fig. 5 is the schematic diagram of specific embodiment C according to the present invention;
Fig. 6 is the schematic diagram of specific embodiment D according to the present invention;
Fig. 7 is the schematic diagram of specific embodiment E according to the present invention.
Specific embodiment
It is carried out below with reference to technical effect of the embodiment and attached drawing to design of the invention, specific structure and generation clear Chu, complete description, to be completely understood by the purpose of the present invention, scheme and effect.
It should be noted that unless otherwise specified, when a certain feature referred to as " fixation ", " connection " are in another feature, It can directly fix, be connected to another feature, and can also fix, be connected to another feature indirectly.In addition, this The descriptions such as the upper and lower, left and right used in open are only the mutual alignment pass relative to each component part of the disclosure in attached drawing For system.The "an" of used singular, " described " and "the" are also intended to including most forms in the disclosure, are removed Non- context clearly expresses other meaning.In addition, unless otherwise defined, all technical and scientific terms used herein It is identical as the normally understood meaning of those skilled in the art.Term used in the description is intended merely to describe herein Specific embodiment is not intended to be limiting of the invention.Term as used herein "and/or" includes one or more relevant The arbitrary combination of listed item.
It will be appreciated that though various elements, but this may be described using term first, second, third, etc. in the disclosure A little elements should not necessarily be limited by these terms.These terms are only used to for same type of element being distinguished from each other out.For example, not departing from In the case where disclosure range, first element can also be referred to as second element, and similarly, second element can also be referred to as One element.The use of provided in this article any and all example or exemplary language (" such as ", " such as ") is intended merely to more Illustrate the embodiment of the present invention well, and unless the context requires otherwise, otherwise the scope of the present invention will not be applied and be limited.
It should be appreciated that the embodiment of the present invention can be by computer hardware, the combination of hardware and software or by depositing The computer instruction in non-transitory computer-readable memory is stored up to be effected or carried out.Standard volume can be used in the method Journey technology-includes that the non-transitory computer-readable storage media configured with computer program is realized in computer program, In configured in this way storage medium computer is operated in a manner of specific and is predefined --- according in a particular embodiment The method and attached drawing of description.Each program can with the programming language of level process or object-oriented come realize with department of computer science System communication.However, if desired, the program can be realized with compilation or machine language.Under any circumstance, which can be volume The language translated or explained.In addition, the program can be run on the specific integrated circuit of programming for this purpose.
In addition, the operation of process described herein can be performed in any suitable order, unless herein in addition instruction or Otherwise significantly with contradicted by context.Process described herein (or modification and/or combination thereof) can be held being configured with It executes, and is can be used as jointly on the one or more processors under the control of one or more computer systems of row instruction The code (for example, executable instruction, one or more computer program or one or more application) of execution, by hardware or its group It closes to realize.The computer program includes the multiple instruction that can be performed by one or more processors.
Further, the method can be realized in being operably coupled to suitable any kind of computing platform, wrap Include but be not limited to PC, mini-computer, main frame, work station, network or distributed computing environment, individual or integrated Computer platform or communicated with charged particle tool or other imaging devices etc..Each aspect of the present invention can be to deposit The machine readable code on non-transitory storage medium or equipment is stored up to realize no matter be moveable or be integrated to calculating Platform, such as hard disk, optical reading and/or write-in storage medium, RAM, ROM, so that it can be read by programmable calculator, when Storage medium or equipment can be used for configuration and operation computer to execute process described herein when being read by computer.This Outside, machine readable code, or part thereof can be transmitted by wired or wireless network.When such media include combining microprocessor Or other data processors realize steps described above instruction or program when, invention as described herein including these and other not The non-transitory computer-readable storage media of same type.When methods and techniques according to the present invention programming, the present invention It further include computer itself.
Computer program can be applied to input data to execute function as described herein, to convert input data with life At storing to the output data of nonvolatile memory.Output information can also be applied to one or more output equipments as shown Device.In the preferred embodiment of the invention, the data of conversion indicate physics and tangible object, including the object generated on display Reason and the particular visual of physical objects are described.
It should be noted that being lifted during convolution algorithm in embodiment in order to facilitate the understanding to the technical program Example be N × 3 matrix, i.e. line number is unlimited, columns be 3 matrix;But this is not to the technical program application range Limitation, actually applicable row matrix columns is without limitation.
It show overall structure figure according to the present invention referring to Fig.1, comprises the following modules:
Preparation module, convolution kernel for inputting original convolution and using to system;The convolution kernel is row and column numerical value Identical matrix area;The convolution kernel includes 2 × 2,3 × 3,4 × 4 and 5 × 5 matrix, optimally, generally uses 3 × 3 Convolution kernel;
Parameter setting module, for inputting empty value and filler to system;Cavity value and filler must be equal;It needs Illustrate, the numerical value in original matrix region is in input cavity value, by being converted into the numerical value in empty convolution multiplied by empty value; Filler refers to being packed into the zero quantity in original convolution;
Generation module, for converting empty convolution by empty value and filler for original convolution;According to filler to Zero is filled in original convolution;The zero of filling only will appear the first row and last column or the first row and last of matrix A line;
Grouping module, for being grouped to empty convolution;When being grouped to empty convolution, when in pre- packet zone There is zero in row or column, then is individually grouped in the region comprising zero;The grouping in other regions is carried out by the demand actually taken a little;
Computing module, for carrying out operation to the convolution being grouped;When carrying out operation to the region being grouped, it can make With same convolution kernel, different convolution kernels can also be applied in combination;
Accumulator module, for summarizing the operation result of each grouping and then exporting final result.
Overview flow chart according to the present invention is shown referring to Fig. 2, comprising the following steps:
S1, original convolution and the convolution kernel used are inputted to system;The convolution kernel is the identical matrix of row and column numerical value Region;Including 2 × 2,3 × 3,4 × 4 and 5 × 5 matrix;
S2, cavity value and filler are inputted to system;Cavity value and filler must be equal;
S3, empty convolution is converted by empty value and filler by original convolution;According to filler into original convolution Fill zero;The zero of filling only will appear the first row and last column or the first row and last line of matrix;
S4, empty convolution is grouped;When being grouped to empty convolution, when the row or column in pre- packet zone occurs Region comprising zero is then individually grouped by zero;
S5, operation is carried out to the convolution being grouped;When carrying out operation to the region being grouped, same convolution can be used Different convolution kernels can also be applied in combination in core;
S6, the operation result for summarizing each grouping export final result in turn.
The schematic diagram of specific embodiment A according to the present invention is shown referring to Fig. 3, what embodiment A illustrated is empty convolution sum The matrix multiplication of non-cavity convolution in the row direction is grouped example, and what left side table indicated in schematic diagram is original convolution, i.e., not Insertion cavity value;What intermediate table indicated is the variation of the matrix when the empty value of input is 6, and the first column data is constant, second Column data is 6 times of initial data, and third column data is similarly;What right side table indicated is phase when using 3 × 3 filter The construction for answering convolution kernel, i.e., 3 × 3 convolution kernels hereinbefore referred to.
Show the schematic diagram of specific embodiment B according to the present invention referring to Fig. 4, the data of data matrix with arrange for direction into Row scanning.
In SSD-VGG300 network, if the size of empty convolved data input is 19x19x512, that is, need to choose 19 A Effective Numerical or data point, the traditional approach being shown respectively under SSD-VGG300 network referring to figure 5 and figure 6 and grouping The calculating process of mode;Effectively value or data point are 0-18 in traditional approach, are effectively worth in packet mode or data point is 6 To 24;Assuming that the multiplication that max calculation provides 12 × 3 × n calculates power, data RAM at most takes 14 groups of data.
When carrying out operation using traditional approach, 10 calculating is needed, i.e., can just take 19 enough required significant figures 10 times Value or data point, the calculation amount of single are 2*3*n.
And when converting empty convolution for numerical value, empty value and filler are 6, and the X in table indicates no data herein, The zero hereinbefore referred to;When occurring meta-data location in row, uniline is down to two dimension from three-dimensional data;Packet mode is according to preceding Mentioned by text, the region for zero occur individually is divided into one group;And 6 calculating is only needed after being grouped, the single most matter of fundamental importance Calculation amount is 6 × 2*n;Packet mode improves 40% efficiency compared to traditional approach.
The schematic diagram of specific embodiment D according to the present invention is shown referring to Fig. 7, embodiment D illustrates, in SSD- In VGG200 network, the data input size of empty convolution is 12 × 12 × 512, that is, needs to choose 19 Effective Numericals or data Point, convolution cavity value and Filling power are 6, and the X in table indicates that no data, uniline are down to two dimension from three-dimensional herein;It is false simultaneously If the multiplication that max calculation provides 12 × 3 × n calculates power, data RAM at most takes 14 groups of data;According to this method, data will divide It is two groups, which is only needed once to calculate the multiplying that 12 × 2 × n can be completed, multiplier utilization rate reaches 66.7%;It is needed if using traditional approach 6 times, single utilization rate is only 1/6th;The former utilization rate conventional method 4 times, operational performance is 6 times of conventional method, promoted effect highly significant.
The above, only presently preferred embodiments of the present invention, the invention is not limited to above embodiment, as long as It reaches technical effect of the invention with identical means, all within the spirits and principles of the present invention, any modification for being made, Equivalent replacement, improvement etc., should be included within the scope of the present invention.Its technical solution within the scope of the present invention And/or embodiment can have a variety of different modifications and variations.

Claims (9)

1. a kind of cavity convolution acceleration system, which is characterized in that the system comprises the following modules:
Preparation module, convolution kernel for inputting original convolution and using to system;
Parameter setting module, for inputting empty value and filler to system;
Generation module, for converting empty convolution by empty value and filler for original convolution;
Grouping module, for being grouped to empty convolution;
Computing module, for carrying out operation to the convolution being grouped;
Accumulator module, for summarizing the operation result of each grouping and then exporting final result.
2. cavity convolution acceleration system according to claim 1, which is characterized in that the convolution kernel is row and column numerical value phase Same matrix area.
3. cavity convolution acceleration system according to claim 2, which is characterized in that the convolution kernel includes 2 × 2,3 × 3, 4 × 4 and 5 × 5 matrix.
4. cavity convolution acceleration system according to claim 1, which is characterized in that cavity value and filler must be equal.
5. cavity convolution acceleration system according to claim 1, which is characterized in that filled out according to filler into original convolution Zeroize value.
6. cavity convolution acceleration system according to claim 5, which is characterized in that the zero of filling only will appear matrix First row and last column or the first row and last line.
7. cavity convolution acceleration system according to claim 1, which is characterized in that when being grouped to empty convolution, when There is zero in row or column in pre- packet zone, then is individually grouped in the region comprising zero.
8. cavity convolution acceleration system according to claim 1, which is characterized in that carry out operation to the region being grouped When, same convolution kernel can be used, different convolution kernels can also be applied in combination.
9. a kind of cavity convolution accelerated method, which is characterized in that method includes the following steps:
S1, original convolution and the convolution kernel used are inputted to system;
S2, cavity value and filler are inputted to system;
S3, empty convolution is converted by empty value and filler by original convolution;
S4, empty convolution is grouped;
S5, operation is carried out to the convolution being grouped;
S6, the operation result for summarizing each grouping export final result in turn.
CN201811573074.9A 2018-12-21 2018-12-21 A kind of cavity convolution acceleration system and its method Pending CN109800857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811573074.9A CN109800857A (en) 2018-12-21 2018-12-21 A kind of cavity convolution acceleration system and its method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811573074.9A CN109800857A (en) 2018-12-21 2018-12-21 A kind of cavity convolution acceleration system and its method

Publications (1)

Publication Number Publication Date
CN109800857A true CN109800857A (en) 2019-05-24

Family

ID=66557377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811573074.9A Pending CN109800857A (en) 2018-12-21 2018-12-21 A kind of cavity convolution acceleration system and its method

Country Status (1)

Country Link
CN (1) CN109800857A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110680278A (en) * 2019-09-10 2020-01-14 广州视源电子科技股份有限公司 Electrocardiosignal recognition device based on convolutional neural network
CN111951269A (en) * 2020-10-16 2020-11-17 深圳云天励飞技术股份有限公司 Image processing method and related equipment
CN112836803A (en) * 2021-02-04 2021-05-25 珠海亿智电子科技有限公司 Data placement method for improving convolution operation efficiency

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110680278A (en) * 2019-09-10 2020-01-14 广州视源电子科技股份有限公司 Electrocardiosignal recognition device based on convolutional neural network
CN110680278B (en) * 2019-09-10 2022-07-19 广州视源电子科技股份有限公司 Electrocardiosignal recognition device based on convolutional neural network
CN111951269A (en) * 2020-10-16 2020-11-17 深圳云天励飞技术股份有限公司 Image processing method and related equipment
CN112836803A (en) * 2021-02-04 2021-05-25 珠海亿智电子科技有限公司 Data placement method for improving convolution operation efficiency

Similar Documents

Publication Publication Date Title
US10007742B2 (en) Particle flow simulation system and method
CN109800857A (en) A kind of cavity convolution acceleration system and its method
CN103970720B (en) Based on extensive coarseness imbedded reconfigurable system and its processing method
CN107766376A (en) Data alignment method and device
CN109063235A (en) A kind of coupling of multiple physics system and method for reactor simulation
CN111758107A (en) System and method for hardware-based pooling
CN107133190A (en) The training method and training system of a kind of machine learning system
Dubey et al. Challenges of Extreme Computing using the FLASH code
CN105264488B (en) For using array to merging the method and system of ordered list
CN106412124B (en) A kind of and sequence cloud service platform task distribution system and method for allocating tasks
CN103984560A (en) Embedded reconfigurable system based on large-scale coarseness and processing method thereof
CN109934336A (en) Neural network dynamic based on optimum structure search accelerates platform designing method and neural network dynamic to accelerate platform
Scarf Testing for optimality in the absence of convexity
CN108595788A (en) A kind of flow field Accelerated Convergence Method based on mode multi grid
CN104765589A (en) Grid parallel preprocessing method based on MPI
CN104573279A (en) Method for quickly generating nuclear radiation shield calculation grids based on deep stripping
CN107085562A (en) A kind of neural network processor and design method based on efficient multiplexing data flow
CN109447276A (en) A kind of machine learning method, system, equipment and application method
CN105989213A (en) Communication engineering drawing generation method and drawing design client
CN106502720A (en) A kind of data processing method and device
CN101980182A (en) Matrix operation-based parallel computing method
CN107516131A (en) Acceleration method and device, electronic equipment and the storage medium of convolutional calculation
CN103838937A (en) Energy consumption simulation modeling system
CN104143116A (en) System soft protection combinatorial optimization method based on particle swarm optimization
CN108491924A (en) A kind of serial stream treatment device of Neural Network Data calculated towards artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190524

RJ01 Rejection of invention patent application after publication