CN107656899A

CN107656899A - A kind of mask convolution method and system based on FPGA

Info

Publication number: CN107656899A
Application number: CN201710888288.4A
Authority: CN
Inventors: 李东; 敖晟; 田劲东; 田勇
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2017-09-27
Filing date: 2017-09-27
Publication date: 2018-02-02

Abstract

The invention discloses a kind of mask convolution method and system based on FPGA, this method includes obtaining the data bit width of view data, and the register group of corresponding depth is selected based on the data bit width；Obtain view data and be stored in the register group, obtain convolution coefficient and be stored in ROM；Obtain the selection parameter for associating the register group and the convolution coefficient；The data of extraction register group storage and corresponding convolution coefficient simultaneously carry out multiplying, based on adder group by the results added of the multiplying to realize convolution algorithm.The system is used to perform corresponding method.The register group data storage of present invention selection matching view data, select ROM storage convolution coefficients, multiplying is carried out by the corresponding relation of register and convolution coefficient, the add operation of product is carried out to realize convolution algorithm by adder group, convolution processing result can be improved, improve treatment effeciency.

Description

A kind of mask convolution method and system based on FPGA

Technical field

The present invention relates to technical field of image processing, more particularly to a kind of mask convolution implementation method based on FPGA and it is System.

Background technology

In Digital Image Processing, it is a kind of important method that spatial domain, which carries out processing to image,.Common some spaces filter Ripple operates, and comprising linear processes, the important operation frequently referred to is exactly image convolution computing.Because convolution algorithm needs Very big multiplies-adds operand, therefore causes processing high-definition picture time-consuming too long.The implementation method of system is using general CPU or DSP process machine, and mask convolution computing is carried out by pipeline system.Due to the limitation of CPU or DSP speed, for height Fast design, conventional method in real time can no longer meet to require.

The content of the invention

In order to solve the above problems, the present invention is by providing a kind of mask convolution method and system based on FPGA.

On the one hand the technical solution adopted by the present invention is a kind of mask convolution implementation method based on FPGA, including step： The data bit width of view data is obtained, the register group of corresponding depth is selected based on the data bit width；Obtain view data simultaneously The register group is stored in, convolution coefficient is obtained and is stored in ROM；Obtain for associating the register group and the convolution coefficient Selection parameter；The data of extraction register group storage and corresponding convolution coefficient simultaneously carry out multiplying, will based on adder group The results added of the multiplying is to realize convolution algorithm.

The shift register group of the corresponding depth of data bit width selection is preferably based on, the shift register group is used In obtaining view data, the register group obtains view data from the shift register group and stored.

Preferably, the data volume parameter and window size parameter of pending view data are obtained；Based on data volume parameter The register of respective amount and arrangement is selected from register group and is stored in view data；Based on the window size parameter extraction Correspond to the data of the register storage of arrangement simultaneously and its corresponding convolution coefficient carries out multiplying, multiply described in the acquisition of adder group The results added of method computing is to realize convolution algorithm.

Preferably, the convolution coefficient is stored in Hex files, and the Hex files deposit in the ROM.

Preferably, the adder group obtains the multiplication result and carries out add operation based on tree structure.

On the other hand the technical solution adopted by the present invention realizes system for a kind of mask convolution based on FPGA, including：Ginseng Number input module, for obtaining the data bit width of view data, the register group of corresponding depth is selected based on the data bit width； Data input module, for obtaining view data and being stored in the register group, obtain convolution coefficient and be stored in ROM；Calculate mould Block, it is used to associate the register group and the selection parameter of the convolution coefficient for obtaining；Computing module, it is additionally operable to extraction and posts The data of storage group storage and corresponding convolution coefficient simultaneously carry out multiplying, based on adder group by the result of the multiplying It is added to realize convolution algorithm.

Preferably, the parameter input module, it is additionally operable to select the shift LD of corresponding depth based on the data bit width Device group；The shift register group is used to obtain view data, and the register group obtains image from the shift register group Data simultaneously store.

Preferably, in addition to window module, for obtaining the data volume parameter and window size of pending view data Parameter；The data input module is selected the register of respective amount and arrangement based on data volume parameter and deposited from register group Enter view data；The data of register storage of the computing module based on the corresponding arrangement of the window size parameter extraction and and Its corresponding convolution coefficient carries out multiplying, and adder group obtains the results added of the multiplying to realize that convolution is transported Calculate.

Beneficial effects of the present invention are the register group data storage of selection matching view data, select ROM storage convolution Coefficient, multiplying is carried out by the corresponding relation of register and convolution coefficient, the addition that product is carried out by adder group is transported Calculate to realize convolution algorithm, convolution processing result can be improved, improve treatment effeciency.

Brief description of the drawings

Fig. 1 show the schematic diagram of the FPGA basic structures based on the embodiment of the present invention；

Fig. 2 show the schematic diagram of the convolution basic module based on the embodiment of the present invention；

Fig. 3 show the multiplication schematic diagram based on the embodiment of the present invention；

Fig. 4 show the add tree schematic diagram based on the embodiment of the present invention.

Embodiment

The present invention will be described with reference to embodiments.

Embodiment 1 based on invention, a kind of mask convolution implementation method based on FPGA, including step：Obtain picture number According to data bit width, the register group of corresponding depth is selected based on the data bit width；Obtain and posted described in view data and deposit Storage group, obtain convolution coefficient and be stored in ROM；Obtain and join for associating the register group and the selection of the convolution coefficient Number；The data of extraction register group storage and corresponding convolution coefficient simultaneously carry out multiplying, based on adder group by the multiplication The results added of computing is to realize convolution algorithm.

Method based on embodiment, in addition to：The shift register group of corresponding depth, institute are selected based on the data bit width State shift register group to be used to obtain view data, the register group obtains view data from the shift register group and deposited Storage.

Method based on embodiment, in addition to：Obtain the data volume parameter and window size ginseng of pending view data Number；The register of respective amount and arrangement is selected from register group based on data volume parameter and is stored in view data；Based on institute State the data of the register storage of the corresponding arrangement of window size parameter extraction simultaneously and its corresponding convolution coefficient carry out multiplying, Adder group obtains the results added of the multiplying to realize convolution algorithm.

Method based on embodiment, the convolution coefficient are stored in Hex files, and the Hex files deposit in the ROM.

Method based on embodiment, the adder group are obtained the multiplication result and added based on tree structure Method computing.

With the progress of integrated circuit technology, FPGA performance is obviously improved, and it has been provided the user more Resource and Geng Gao speed can be handled.The platform that scheme is implemented is FPGA, obtains each seed ginseng of outside input first Count, such as the size of the size of the view data in convolution algorithm, data bit width and convolution window (i.e. described template)；Wherein, The register group of corresponding depth is selected (because FPGA platform may include the register of many specifications, root according to data bit width The suitable register of specification is selected according to data bit width, can so increase the utilization ratio to whole system, mark these deposits Device is combined as register group), obtain pending view data and be stored in register group, obtain convolution coefficient (i.e. multiplication system Number) and ROM is stored in, wherein, convolution coefficient is stored in in ROM history file, with setting register address and file The corresponding relation of location, extraction there can be corresponding address in the history file while data of register storage are extracted Numerical value (i.e. convolution coefficient), the data and numerical value are subjected to multiplying, by the data of all registers (i.e. register group) With the product addition of corresponding convolution coefficient, convolution algorithm is realized.

Further improved on the basis of above-described embodiment 1, due to factors such as the limitations of computing capability, it will usually using volume Product window carries out processing data, now, obtains the parameter (i.e. described window size parameter) on window size, such as group first Into N*N window, i.e. the convolution window is made up of N*N register, the depth of register and the view data stored Bit wide is consistent, by the digital independent of this N*N register and carries out above-mentioned multiplying (being carried out respectively by N*N register) With add operation (multiplication result is added), convolution algorithm is realized, by the limitation of window, whole FPGA processing datas can be controlled Disposal ability.

Further improved on the basis of above-described embodiment 1, the shift register of (or selection certain amount) is set For as buffer unit, in the above example, it to be N*N sizes to set convolution window, then is correspondingly arranged N-1 shift LD Device, the shift register join end to end to form row, a line register of the convolution window respectively with shift register one by one It is corresponding, for realizing the transmission of data, i.e., shift register obtain outside data (i.e. view data), convolution window is from displacement Register obtains view data.

Further improved on the basis of above-described embodiment 1, convolution window is modified to arbitrary shape size.For M* N rectangular window, then the convolution window be adjusted to be made up of M*N register, shift register is adjusted to N-1.For radius For R ox-eye, then the convolution window is adjusted to be made up of (2R+1) * (2R+1) individual register, and shift register is adjusted to 2R+ 1, the circular window array that radius is R is then marked from (2R+1) * (2R+1) individual window registers.Other arbitrary shapes are big Small window can be cut by corresponding rectangular window and be realized.

For the explanation of embodiment 1, Hex files are changed easily in Quartus II, and symbol ten can have been selected to enter The input of system, it is user-friendly, it is well suited as the carrier of convolution coefficient.Coefficient write-in leads Hex files after preserving Enter in FPGA on-chip memories ROM, need to read ROM address realm by window size adjust automatically.

For N × N convolution windows, ROM addressing ranges are 0~(N²- 1), by N²Individual convolution coefficientTake out, ROM output and N²Level production line is connected.The sharp value for carrying out adjustment factor in this way and number are very convenient, realize any The extraction and application of coefficient, for the convolution window of arbitrary shape size, ROM addressing ranges are equal to the number of window registers.

For the explanation of embodiment 1, normal additive process is to extract data one by one and be added, but more in data When can increase add time, under conditions of FPGA, can allowing adder group, the register adjacent with two connects simultaneously respectively Multiplication result corresponding to acquisition, the add operation result for then extracting two adders carry out add operation again, then It can realize that multiple data are handled in a clock, the data of required clock and adder can greatly reduce, and improve Operation efficiency again reduces the occupancy of resource.

Embodiment 2 based on invention, a kind of mask convolution based on FPGA realize system, including：Parameter input module, use In the data bit width for obtaining view data, the register group of corresponding depth is selected based on the data bit width, is also used for obtaining volume The shape size of product window, corresponding convolution algorithm unit is automatically generated based on the parameter；Data input module, for obtaining View data is simultaneously stored in the register group, obtains convolution coefficient and imports ROM；Computing module, it is used to associate institute for obtaining State the selection parameter of register group and the convolution coefficient；Computing module, it is additionally operable to extract the data of register group storage and right Answer convolution coefficient and carry out multiplying, based on adder group by the results added of the multiplying to realize convolution algorithm.

System based on embodiment, the parameter input module, it is additionally operable to based on the corresponding depth of data bit width selection Shift register group；The shift register group is used to obtain view data, and the register group is from the shift register Group obtains view data and stored.

System based on embodiment, in addition to window module, for obtaining the data volume parameter of pending view data With window size parameter；The data input module selects respective amount and arrangement based on data volume parameter from register group Register is simultaneously stored in view data；Register storage of the computing module based on the corresponding arrangement of the window size parameter extraction Data and and its corresponding convolution coefficient carry out multiplying, the results added of the adder group acquisition multiplying is with reality Existing convolution algorithm.

System based on embodiment, the convolution coefficient are stored in Hex files, and the Hex files deposit in the ROM.

System based on embodiment, the adder group are obtained the multiplication result and added based on tree structure Method computing.

Embodiment 3 based on invention, realize the process that FPGA convolution is realized：

FPGA basic structures as shown in Figure 1, including central control unit, input block (i.e. data-interface), row caching Unit (being made up of shift register), convolution windows units (register group), convolution algorithm unit (obtain the data of register group And carry out multiplying), add tree unit (adder group), template parameter interface (i.e. data-interface or data input pin), Convolution coefficient interface (i.e. ROM, for storing multiplication coefficient) and output unit；Wherein, input block connection line buffer unit, OK Buffer unit connects convolution windows units, convolution windows units connection convolution algorithm unit, convolution algorithm unit connection add tree Unit, add tree unit connection output unit；Template parameter interface connects line buffer unit respectively, convolution windows units (are used for Define shift register, the quantity of register)；Convolution coefficient interface connects convolution algorithm unit to provide multiplication coefficient.

The first step, establishing convolution basic module as shown in Figure 2 (includes outside auxiliary unit, convolution windows units, OK Buffer unit and convolution algorithm unit), wherein

Line_buffer1~Line_buffer (N-1) is N-1 row register group (being made up of shift register), Din It is the input/output terminal of each row register group respectively with Dout,It is N²Individual register.It is first between shift register Tail is connected, and the shift register number in each row register group is identical with the number of a line view data, shift register Depth is identical with the view data bit wide received, and in each N number of register of shift register row external connection, and this is N number of to post Storage is also to join end to end.

By N in convolution windows units²The data of individual register and the weight coefficient of outside input (i.e. Weights, are stored in ROM multiplying) is carried out, then is added two-by-two by add tree module, finally obtains convolutional calculation result.

As can be known from Fig. 2, except Din0 is connected with data input pin Pix_in, the output of remaining m-th shift register Hold Dout (M) the input Din (M+1) and the MN register P with the M+1 shift register respectively_MNIt is connected, while each Window registers P_MAll with previous P_M-1Streamline is formed, said structure is all realized using the method for the multiple example of loop iteration.

The auxiliary unit of the outside includes ROM and the template parameter interface (N*N being used in defined parameters, such as figure Shift Registers), Multi pliers are convolution algorithm unit in figure, and Add Tree are add tree unit.

Second step, multiplication schematic diagram as shown in Figure 3, convolution coefficient is inputted, by convolution window (i.e. convolution windows units) Multiplying is carried out with corresponding convolution coefficient.Convolution coefficient is written in Hex files from outside, and Hex files are imported into FPGA In on-chip memory ROM, need to read ROM address realm by window size adjust automatically.By taking 3 × 3 windows as an example, data Bit wide is 8Bit, coefficient Q₀~Q₈Span be -128~+127.Mask coefficient Q₀~Q₈Permutation matrix be：

The order for writing Hex files is as shown in Figure 3.After coefficient imports, in ROM 0~8 address location is then addressed, Output is connected with 9 level production lines.For N × N convolution windows, ROM addressing ranges are 0~(N²- 1), by N²Individual coefficientAfter taking-up, multiplying is carried out with convolution window cell array.

3rd step, add tree schematic diagram as shown in Figure 4, by the multiplication result (D in convolution algorithm₀~D_N, wherein D For product, it is adder that N answers ID, Reg for register pair) input add tree module, it is cumulative to finish the complete convolution algorithm of output As a result.Two adjacent data are subjected to add operation two-by-two, if N²It is even number, first time computing needs N²/ 2 adders, N²/ 2 registers；Second of computing needs N²/ 4 adders, N²/ 4 registers.If N²It is odd number, first time computing needs Want N²/ 2 adders, N²/ 2+1 register；Second of computing needs N²/ 4 adders, N²/4+(N²/ 2) %2 deposit Device.By that analogy, each number to be added is：

a₀=N*N, wherein a_nRepresent the data amount check that n-th computing needs to be added, a₀It is just Initial value, N*N are convolution window size, and each computing spending is a clock.

The frequency n expression formula that add tree computing is fully completed needs is：2^n-1<N*N≤2ⁿ, wherein, required clock number Also it is n.Therefore, it is cumulative to finish required operation times and clock cycle number n satisfactions for N number of data：log₂N≤n< log₂N+1。

The convolution windows units are a kind of hardware configurations being adjusted flexibly, and not similar shape is automatically generated according to input parameter The window of shape size, in the absence of the wasting of resources.

It is described above, simply presently preferred embodiments of the present invention, the invention is not limited in above-mentioned embodiment, as long as It reaches the technique effect of the present invention with identical means, should all belong to protection scope of the present invention.In the protection model of the present invention Its technical scheme and/or embodiment can have a variety of modifications and variations in enclosing.

Claims

1. a kind of mask convolution implementation method based on FPGA, it is characterised in that including step：

The data bit width of view data is obtained, the register group of corresponding depth is selected based on the data bit width；

Obtain view data and be stored in the register group, obtain convolution coefficient and be stored in ROM；

Obtain the selection parameter for associating the register group and the convolution coefficient；

The data of extraction register group storage and corresponding convolution coefficient simultaneously carry out multiplying, based on adder group by the multiplication The results added of computing is to realize convolution algorithm.

2. a kind of mask convolution implementation method based on FPGA according to claim 1, it is characterised in that also include:

The shift register group of corresponding depth is selected based on the data bit width, the shift register group is used to obtain picture number According to the register group obtains view data from the shift register group and stored.

3. a kind of mask convolution implementation method based on FPGA according to claim 2, it is characterised in that also include：

Obtain the data volume parameter and window size parameter of pending view data；

The register of respective amount and arrangement is selected from register group based on data volume parameter and is stored in view data；

The data of register storage based on the corresponding arrangement of the window size parameter extraction are simultaneously entered with its corresponding convolution coefficient Row multiplying, adder group obtain the results added of the multiplying to realize convolution algorithm.

A kind of 4. mask convolution implementation method based on FPGA according to any one of claims 1 to 3, it is characterised in that The convolution coefficient is stored in Hex files, and the Hex files deposit in the ROM.

A kind of 5. mask convolution implementation method based on FPGA according to claim 4, it is characterised in that the adder Group obtains the multiplication result and carries out add operation based on tree structure.

6. a kind of mask convolution based on FPGA realizes system, it is characterised in that including：

Parameter input module, for obtaining the data bit width of view data, posting for corresponding depth is selected based on the data bit width Storage group；

Data input module, for obtaining view data and being stored in the register group, obtain convolution coefficient and be stored in ROM；

Computing module, it is used to associate the register group and the selection parameter of the convolution coefficient for obtaining；

Computing module, it is additionally operable to extract the data of register group storage and corresponding convolution coefficient and carries out multiplying, based on adds Musical instruments used in a Buddhist or Taoist mass group is by the results added of the multiplying to realize convolution algorithm.

7. a kind of mask convolution based on FPGA according to claim 6 realizes system, it is characterised in that the parameter is defeated Enter module, be additionally operable to select the shift register group of corresponding depth based on the data bit width；

The shift register group is used to obtain view data, and the register group obtains picture number from the shift register group According to and store.

8. a kind of mask convolution based on FPGA according to claim 7 realizes system, it is characterised in that also including window Module, for obtaining the data volume parameter and window size parameter of pending view data；

The data input module is selected the register of respective amount and arrangement based on data volume parameter and deposited from register group Enter view data；

The data of register storage of the computing module based on the corresponding arrangement of the window size parameter extraction simultaneously correspond to it Convolution coefficient carry out multiplying, adder group obtains the results added of the multiplying to realize convolution algorithm.

9. a kind of mask convolution based on FPGA according to claim 6~8 realizes system, it is characterised in that the volume Product coefficient is stored in Hex files, and the Hex files deposit in the ROM.

10. a kind of mask convolution based on FPGA according to claim 9 realizes system, it is characterised in that the addition Device group obtains the multiplication result and carries out add operation based on tree structure.