CN208766715U - The accelerating circuit of 3*3 convolution algorithm - Google Patents

The accelerating circuit of 3*3 convolution algorithm Download PDF

Info

Publication number
CN208766715U
CN208766715U CN201821189844.5U CN201821189844U CN208766715U CN 208766715 U CN208766715 U CN 208766715U CN 201821189844 U CN201821189844 U CN 201821189844U CN 208766715 U CN208766715 U CN 208766715U
Authority
CN
China
Prior art keywords
convolution
row
pixel data
state
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn - After Issue
Application number
CN201821189844.5U
Other languages
Chinese (zh)
Inventor
何再生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Amicro Semiconductor Co Ltd
Original Assignee
Zhuhai Amicro Semiconductor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Amicro Semiconductor Co Ltd filed Critical Zhuhai Amicro Semiconductor Co Ltd
Priority to CN201821189844.5U priority Critical patent/CN208766715U/en
Application granted granted Critical
Publication of CN208766715U publication Critical patent/CN208766715U/en
Withdrawn - After Issue legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The utility model discloses a kind of accelerating circuit of 3*3 convolution algorithm, including DDR module, convolution results fifo module, host state machine control module, displacement selection control module, row buffering module and convolutional calculation module.Main control module is happened suddenly from the pixel storage array by ahb bus interface reads the pixel data of the current adjacent rows of input picture, and the parallel shift for controlling pixel data in displacement selection control module makes the pixel data that convolutional calculation module is written every time carry out convolution algorithm with corresponding convolution Nuclear Data, by ahb bus, by the pixel data of current adjacent rows, the convolution results storage array is written in the operation result in convolutional calculation module again, then the pixel data of next line is read by ahb bus, and carry out corresponding displacement and convolution algorithm processing, it is issued after the completion of the data processing of input picture all pixels and interrupts the result for informing the processing of CPU convolution algorithm to reduce software instruction expense.

Description

The accelerating circuit of 3*3 convolution algorithm
Technical field
The utility model relates to mechanical vision inspection technology field more particularly to the accelerating circuits of 3*3 convolution algorithm.
Background technique
Currently, sweeper needs to calculate by a set of image procossing when building figure and positioning using video image progress map Method pre-processes camera acquired image data, such as image filtering, the elimination of picture noise, and the feature of image increases By force, the smoothing processing, etc. of image.
In existing technical field of machine vision, window processing is a kind of processing common in image procossing, its thought is pair In image array, arithmetic operation is carried out to image by the minor matrix of a fixed size (such as 3*3).Common window processing packet Include morphological operation, fuzzy filter, gaussian filtering etc..Wherein, convolution algorithm is widely used.But convolution algorithm needs Software reads in image data, caches, and calculates, then writes out, and needs to consume a large amount of software instruction, occupies a large amount of soft Part resource, leads to inefficiency.
Utility model content
In order to save software resource, instruction efficiency is improved, the utility model realizes base by the way of specific integrated circuit In the fast convolution algorithm of the sliding window of 3*3 size, it is able to achieve hardware concurrent and pipelining in design, may be implemented The acceleration of algorithm, its technical solution is as follows:
A kind of accelerating circuit of 3*3 convolution algorithm, the accelerating circuit include, for storing input picture and image convolution fortune Calculate the DDR module of result and the convolution results fifo module for buffering convolution algorithm result, wherein DDR module includes configuration The base address of input picture and the pixel storage array of memory space, and configure the destination address of image convolution algorithm result Convolution results storage array;The accelerating circuit includes main control module, displacement selection control module, row buffering module and convolution meter Calculate module;Main control module, being used to happen suddenly from the pixel storage array by ahb bus, it is currently adjacent to read input picture The pixel data of two rows, and the parallel shift for controlling pixel data in displacement selection control module makes that convolutional calculation is written every time The pixel data of module accelerates calculating process with matched convolution kernel alignment of data, then incites somebody to action current adjacent two by ahb bus The convolution results storage array is written in capable pixel data operation result in convolutional calculation module;Main control module is also used The pixel data of the next line based on current adjacent rows in through ahb bus reading input picture, and moved accordingly Position and convolution algorithm processing are issued to interrupt after the completion of the data processing of input picture all pixels and be informed at CPU convolution algorithm The result of reason is to reduce software instruction expense;Wherein, convolution Nuclear Data is CPU preconfigured volume in convolutional calculation module The data stored in product window;The pixel data and matched convolution kernel alignment of data of each write-in convolutional calculation module Be, for realize convolution algorithm, control convolution Nuclear Data where convolution window in the convolutional calculation module being written pixel number According in sliding process on corresponding image array, the overlapping comprising the convolution window center where convolution Nuclear Data is formed Region;Row buffering module, including the line buffer that shift register is constituted, for being buffered according to image horizontal pixel data length The pixel data of input picture corresponding line, and the column address signal and status signal that are generated according to main control module input buffering Pixel data in image exports to displacement and selects control module;Displacement selection control module, for defeated according to main control module The pixel data of input picture corresponding line in status signal selection row buffering module out, then simultaneously by each column pixel data Row displacement, and processing is filled to pixel data, so that in input picture in all pixels data write-in convolutional calculation module Complete convolution algorithm;Convolutional calculation module selects the pixel data of control module output and corresponding convolution kernel for that will shift Data carry out multiplying, based on adder group by the results added of the multiplying to realize convolution algorithm.
Further, the line buffer of the row buffering module includes the first line buffer, the second line buffer and third Line buffer, for selecting end and state that end is selected to be connected in parallel respectively with the main control module by column address, in which: the The first of input picture of one line buffer for buffering ahb bus burst reading under the control of the main control module is preset Capable pixel data;Second line buffer under the control of the main control module for buffering the defeated of ahb bus burst reading Enter the pixel data of the second preset row of image;Third line buffer is for buffering AHB under the control of the main control module The pixel data of the preset row of third for the input picture that bus burst is read;The first preset row, the second preset row and The preset row of third is adjacent to each other from the input picture for the reading that happens suddenly in the pixel storage array three of ahb bus Row serial number, and pixel data corresponding to these three row serial numbers is happened suddenly after reading by ahb bus according to matrix convolution operation law It updates.
Further, the displacement selection control module includes first selector, second selector, third selector and 3* 3 convolution window control logics;All there are three input terminal, these three inputs for first selector, second selector and third selector End is respectively first input end, the second input terminal, third input terminal, wherein the first input end of first selector and described the The output end of three line buffers is connected, and the second input terminal of first selector is connected with the output end of first line buffer It connects, the third input terminal of first selector is connected with the output end of second line buffer;The first of second selector is defeated Enter end to be connected with the output end of first line buffer, the second input terminal of second selector and second line buffer Output end be connected, the third input terminal of second selector is connected with the output end of the third line buffer;Third choosing The first input end for selecting device is connected with the output end of second line buffer, the second input terminal of third selector with it is described The output end of third line buffer is connected, the output end phase of the third input terminal of third selector and first line buffer Connection;3*3 convolution window control logic, including by the first shift register, the second shift register and third shift register The 3*3 convolution window of composition, wherein the first shift register, the second shift register and third shift register are all by three Register-combinatorial is constituted;The input terminal of first shift register is connected with the output end of first selector, for buffering the first choosing Select the pixel data that device selection enters 3*3 convolution window the first row;The input terminal of second shift register and second selector Output end is connected, and the pixel data of the second row of 3*3 convolution window is entered for buffering second selector selection;Third shift LD The input terminal of device is connected with the output end of third selector, enters 3*3 convolution window third for buffering third selector selection Capable pixel data;Wherein, an address input end of first selector is connected with an address input end of second selector It connects, another address input end of second selector is connected with an address input end of third selector, third selector Another address input end connect with the main control module, for receiving the status signal.
Further, the displacement selection control module further includes edge filling logic, including filling selector, edge inspection Survey logic and pixel filling logic;Edge detection logic is connected, for sentencing respectively with pixel filling logic and filling selector Determine the pixel to be detected in 3*3 convolution window to export in the address location of input picture, and by judging result signal to pixel It fills logic and fills the selection end of selector;Selector is filled, including fills input terminal and is not filled with input terminal, is used for basis It selects the pixel data of the judging result signal selection respective input of the received edge detection logic in end that convolutional calculation is written Module;Pixel filling logic, for the judging result signal according to edge detection logic to the 3*3 convolution window control logic The pixel data of displacement output is symmetrically filled, so that the figure centered on boundary pixel point that 3*3 convolution window is confined As matrix and convolution Nuclear Data completion planar convolution, and export to the filling input terminal for filling selector.
Further, the main control module includes host state machine, and the working condition of the host state machine includes the first row Write state, the second row write state, the first row convolution state, the third line write state, the second row convolution state, the first row for the first time for the first time Write state, the third line convolution state and the second row write state;Host state machine, under the first row for the first time write state, by the A line read-write enable signal control ahb bus happens suddenly reads the pixel data of the first preset row for the first time, and is written described the In one line buffer;Under the second row for the first time write state, enable signal is read and write by the second row controls ahb bus and happen suddenly for the first time reading The pixel data of the described second preset row is taken, and is written in second line buffer;Under the first row convolution state, pass through The pixel data in first line buffer is read out in the control of a line convolution enable signal, and will according to column address enable signal The pixel data displacement read out, which is written in the convolutional calculation module, carries out convolution algorithm;Under the third line write state, pass through The third line reads and writes enable signal control ahb bus burst and reads the pixel data of the preset row of third, and the third is written In line buffer;Under the second row convolution state, second row buffering is read out by the control of the second row convolution enable signal Pixel data in device, and the pixel data read out displacement is written by the convolutional calculation module according to column address enable signal Middle carry out convolution algorithm;Under the first row write state, enable signal is read and write by the first row and controls ahb bus burst reading update The pixel data of the described first preset row afterwards, and be written in first line buffer;Under the third line convolution state, pass through The pixel data in the third line buffer is read out in the control of the third line convolution enable signal, and according to column address enable signal The pixel data read out displacement is written in the convolutional calculation module and carries out convolution algorithm;Under the second row write state, lead to It crosses the read-write enable signal control ahb bus burst of the second row and reads the pixel data of the updated second preset row, and write Enter in second line buffer;Host state machine further includes ring counter, for generating the work of the corresponding host state machine The status signal of state;Wherein, under the control of the status signal, current line pixel data is in the convolutional calculation mould After block carries out convolution algorithm, next line pixel data is written column-wise as the convolutional calculation module, and the pixel storage array In row pixel data to be processed continue to be happened suddenly to read by ahb bus to enter the empty line buffer so that the first row The pixel data read under write state, the second row write state and described these three working conditions of the third line write state is constantly more Newly, until the pixel data of all rows of input picture in the pixel storage array is all written in the convolutional calculation module Complete convolution algorithm.
Further, main control module further includes convolution algorithm Read-write Catrol state machine, for being in institute in host state machine State progress state conversion under the first row convolution state, the second row convolution state or the third line convolution state;Convolution fortune The working condition for calculating Read-write Catrol state machine includes: that read states, reading row buffering state, displacement for the first time write fifo status, write FIFO Wait state, write bus state and write bus wait state;Convolution algorithm Read-write Catrol state machine, under read states for the first time The pixel data selection for reading first row in first line buffer enters the 3*3 convolution window;Reading row buffering state Described in the lower pixel data selection read in the row buffering module other than the first row of first line buffer enters 3*3 convolution window;In the case where fifo status is write in displacement, according to the count value of the shift counter of generation to the 3*3 convolution window Middle pixel data shifts by column to be transferred to the convolutional calculation module and makees convolution algorithm, and the counting of the read counter according to generation Value is transferred to the convolutional calculation module to 3*3 convolution window line feed displacement and makees convolution algorithm;Writing FIFO wait state Under, the convolution results fifo module is written into the calculated result of the convolutional calculation module, until the convolution results FIFO The storage depth of module is greater than or equal to the burst that the ahb bus is configured and writes data length;Under write bus state, according to On the convolution results write-in ahb bus that the count value for writing counter generated stores the convolution results fifo module, until The result that the pixel data of all row and columns of input picture in the pixel storage array participates in convolutional calculation was all written The convolution results fifo module;Under write bus wait state, according to the count value for writing counter or read counter, institute is determined The result write-in AHB for stating the pixel data participation convolutional calculation of all row and columns of the input picture in pixel storage array is total Line;Wherein, the convolutional calculation is written in the pixel data that the shift counter count value is stored as the 3*3 convolution window The row serial number of module;The pixel data that the count value of the read counter is stored as the 3*3 convolution window is in parallel shift When the column serial number that generates.
Further, the change in count value of the shift counter and the read counter is defeated as the column address signal Out to the row buffering module.
Further, main control module further includes AHB interface state of a control machine, for driving the host state machine and described Convolution algorithm Read-write Catrol state machine reads and writes the data on ahb bus, and is determined by each working condition in the host state machine Its state transition condition is determined to realize the burst transfer of the data on ahb bus.
The technical solution of the utility model is figure of the slide window implementation based on 3*3 for 2*2 to 1024*1024 size The fast convolution of picture is handled, and compared with the prior art, the pixel data that entire image is completed by expending a small amount of hardware resource is rolled up Product calculates, and processing image is complete, and image display effect will not be influenced by image boundary;The reading of image data shifts convolution Treatment process is handled and write out, the bandwidth and computing resource consumption of CPU is saved, reduces time-consuming.
Detailed description of the invention
Fig. 1 is a kind of overall structure block diagram of the accelerating circuit of 3*3 convolution algorithm provided by the embodiment of the utility model;
Fig. 2 is the internal structure frame of convolutional calculation module provided by the embodiment of the utility model and displacement selection control module Figure;
Fig. 3 is in the utility model embodiment to the symmetrical filling schematic diagram of the edge pixel of input picture;
Fig. 4 is the working condition transition diagram of host state machine provided by the embodiment of the utility model;
Fig. 5 is that the working condition of convolution algorithm Read-write Catrol state machine provided by the embodiment of the utility model converts signal Figure;
Fig. 6 provides the schematic diagram that convolution kernel data window slides in the input image for the utility model embodiment;
Fig. 7 is the state transition diagram of AHB interface state of a control machine provided by the embodiment of the utility model.
Specific embodiment
Specific embodiment of the present utility model is described further with reference to the accompanying drawing:
The utility model design is: the window of 3*3 convolution kernel passes through 3 single port 1KB in traversal input source image process Row buffer sram cache stores the pixels of the input source images stored in DDR or SRAM, and by state machine and column Address counter value realizes that the central pixel point of this window is slided in input source images to complete matrix convolution operation, then will 16 layers 8 FIFO are written in convolution algorithm result, and the data in FIFO are finally write back DDR using ahb bus and are sent to CPU Interrupt instruction, to realize hardware-accelerated convolutional calculation.
Conceived based on above-mentioned utility model, the utility model embodiment provides a kind of accelerating circuit of 3*3 convolution algorithm, such as Shown in Fig. 1, which includes, for storing the storage medium of input picture and image convolution operation result, for buffering The convolution results FIFO module of convolution algorithm result and the ahb bus read and write for controlling DDR module, wherein the storage is situated between Matter includes on piece SRAM and the outer DDR of piece.The storage medium described in the utility model embodiment uses DDR, as shown in Figure 1 It include base address and the pixel storage array of memory space of configuration input picture, and configuration image convolution fortune in DDR module Calculate the convolution results storage array of the destination address of result, wherein the pixel size range of input picture is 2*2 to 1024* 1024, and be stored in DDR module with a matrix type.The convolution results fifo module is in the utility model embodiment The FIFO that depth is 16, bit wide is 8.The ahb bus is in the driving of AHB interface state of a control machine and the control for the read states machine that happens suddenly The pixel data of burst-length in input picture is read from burst in the pixel storage array under production use, in AHB interface control The convolution of burst-length in the convolution results fifo module under the driving and burst read states machine control action of state machine processed As a result it is written in the convolution results storage array, the concrete operations of above-mentioned state machine all use the routine under AMB AHB agreement Technological means, and burst16, burst8, burst4 and burst2 is supported to transmit, it is repeated no more in the utility model embodiment.
As shown in Figure 1, the accelerating circuit further includes host state machine control module, displacement selection control module, row buffering Module and convolutional calculation module;Wherein, main control module is DMA master module, electrification reset in the utility model implementation Afterwards, the row of the pixel of DDR module input picture described in software initialization to pixel quantity row_size and is arranged to pixel quantity The base address INADDR of correspondence image data matrix, the convolution results store battle array in col_size, the pixel storage array Output the base address OUTADDR, 3*3 convolution Nuclear Data and normalized coefficient in convolutional calculation module of column, then master control Molding block receives interruption enable signal and starts actively to issue the visit order to storage medium, main control module adoption status machine control Mode processed generates corresponding control letter to the DDR module, displacement selection control module, row buffering module and convolutional calculation module Number;Specifically, in the utility model embodiment, since the design logic for being related to convolution algorithm is complicated, so passing through design shape State machine realizes operation and Read-write Catrol process, and main control module includes host state machine, supports convolution results burst write operations Convolution algorithm Read-write Catrol state machine and AHB interface state of a control machine are converted using the state of above-mentioned state machine, are participated in CPU The convolutional calculation that image is completed in the case where spending very little, is greatly saved the bandwidth of CPU, significantly reduces software time-consuming, mentions High software efficiency.
As the utility model embodiment, the main control module utilizes the AHB under the control of burst read states machine The pixel data of the current adjacent rows of input picture in the pixel storage array is read in bus burst, and is packed into row buffering mould Two line buffers in block, wherein every number of pixel per line in the storage depth and input picture of length, row buffering module is read in burst Match according to length, is conducive to the utilization efficiency of raising system;The main control module is current according to displacement selection control module The pixel data read corresponds to the column serial number of the input picture, and pixel data is parallel in control displacement selection control module Displacement is so that the pixel data and matched convolution kernel alignment of data of write-in convolutional calculation module are every time to accelerate calculating process.Its In, it is described it is each write-in convolutional calculation module pixel data and matched convolution kernel alignment of data be for realize convolution algorithm, Convolution window where convolution Nuclear Data slided on the image array corresponding to the pixel data of write-in convolutional calculation module Cheng Zhong forms the overlapping region comprising the convolution window center where convolution Nuclear Data;
The main control module is also used to by controlling the ahb bus burst write operations for convolution results fifo module In convolution results write back the convolution results storage array;Then it is read by ahb bus and is based on current phase in input picture The pixel data of the next line of adjacent two rows, and corresponding displacement and convolution algorithm processing are carried out, until input picture all pixels It is issued after the completion of data processing and interrupts the result for informing the processing of CPU convolution algorithm to reduce software instruction expense;For and issue The result for informing the processing of CPU convolution algorithm is interrupted to reduce software instruction expense.Wherein, convolution Nuclear Data is the accelerating circuit The CPU of the periphery data that register stores in preconfigured window in convolutional calculation module.
Input picture described in the utility model embodiment is to store in the matrix form, since the reading of pixel is generally not Once can all read, but one by one or several pixels be one group, the processing based on template window needs to be arranged Some shift registers constitute line buffer.The row buffering module is based on the 3* inside the displacement selection control module What the size of 3 convolution windows was correspondingly arranged includes the line buffer that 3 shift registers are constituted, for according to image horizontal pixel Data length buffers the pixel data of input picture corresponding line, and according to the column address signal and major state of main control module generation The pixel data buffered in input picture is exported to displacement and selects control module by the status signal that machine generates, and is connect in the AHB The shift register can store the pixel of a line image length under each burst read operation under the driving of mouth state of a control machine Data.It should be noted that the convolutional calculation amount for being related to multiple features is very big, often beyond physical array processing capacity Range is needed by caching intermediate result, and repeatedly calculating could complete after adding up.Support the mode of this calculating are as follows: meter every time The cumulative input calculated is derived from caching, while calculated result is also deposited to caching.Caching can permit there are multiple, can according to need It is read in from specified cumulative caching, and is output to different specified cumulative buffering write-ins.By being cached to the tupe of caching, make The flexible use for obtaining hardware resource is possibly realized;The mode that the write-in of caching takes the burst to read is obtained from ahb bus is obtained Data, the reading of caching take the mode write that happens suddenly that buffering is obtained data write-in ahb bus, adaptation data processing rule Needs, reduce caching dosage.And a cache unit can correspond to multiple circuits, can be improved the parallel of data input Degree, so that on a large scale, high performance parallel is treated as possibility.
Since convolution algorithm needs convolution Nuclear Data to carry out operation in image data sliding window, it is therefore desirable to logic with shift Corresponding shifting function is carried out to image data, logic with shift reads current convolution algorithm sequence number.The utility model embodiment Described in logic with shift be displacement selection control module, select the row for exporting the status signal according to main control module The pixel data of input picture corresponding line in buffer module, then each column pixel data parallel shift is cached, so that write-in volume Convolution is completed during accumulating window sliding of the pixel data of computing module in convolutional calculation module;Wherein, the displacement choosing Control module is selected after the pixel data that do not go together that the row buffering modular concurrent reads in input picture, is selected by the displacement Window registers buffering built in control module.In addition when determining convolution algorithm, built in the displacement selection control module In window storing data determine to convolution pixel data two dimensions in two-dimensional surface direction size (the i.e. transverse direction of pixel And longitudinal direction), according to the size of the two dimensions using it is described displacement selection control module change image data set at sequence it is suitable Sequence (enters a new line and changes column), and determines image data sequence displacement according to the status signal that the main control module exports How much, the convolutional calculation module is based on the corresponding arrangement of window size parameter extraction built in the displacement selection control module The data of register storage and and its corresponding convolution Nuclear Data progress multiplying.
For window operation, a problem existing in the prior art is that boundary part is unable to get processing, causes to export Image than input picture reduce row and column pixel.Need to handle image limit data in order to solve the above problem, i.e., corresponding input The first row pixel data of image, last line pixel data, the first row pixel data of other rows and its last column picture Prime number evidence.The displacement selection control module provided in the utility model embodiment, described in being exported according to main control module Status signal selects the pixel data of input picture corresponding line in row buffering module, then by each column pixel data parallel shift, And pixel data is handled in such a way that symmetric data is filled, so that displacement selection control module will input The convolutional calculation module is written in whole pixel datas displacement of image, to complete the pixel data and volume of whole picture input picture The convolution algorithm of product Nuclear Data.
Convolutional calculation module, including the convolution Nuclear Data, for the pixel data of selection control module output will to be shifted Multiplying is carried out with corresponding convolution Nuclear Data, based on adder group by the results added of the multiplying to realize convolution Operation.In the utility model embodiment, the lines of pixel data data of every a line image are in correlated condition machine in each convolution algorithm Control under pass through displacement one by one carry out convolutional calculation processing.The convolutional calculation module is based on the 3*3 convolution window size Parameter, for extracting in the 3*3 convolution window control logic data of the trigger storage of corresponding arrangement and corresponding described Convolution Nuclear Data completes multiplying in multiplier unit.
The preconfigured 3*3 convolution kernel control logic of CPU is as shown in Fig. 2, 3*3 convolution kernel control in the utility model embodiment Register in the first row in the corresponding 3*3 window of logic processed from left to right is followed successively by register P32, register P31 and posts Storage P30, the register in the second row from left to right are followed successively by register P22, register P21 and register P20, in the third line Register from left to right is followed successively by register P12, register P11 and register P10.It is multiplied and asks in the convolutional calculation module With logic by the way of assembly line, multiplication is carried out to 9 data parallel simultaneously in a clock by 9 parallel multipliers Then these multiplication results are passed through the accumulation result based on the whole cumulative items of tree structured adder group acquisition by operation.
As a kind of mode that the utility model is implemented, as shown in Figure 1, the line buffer of the row buffering module includes the One line buffer, the second line buffer and third line buffer, for selecting end col_addr and state to select by column address End state is connected in parallel with the main control module respectively, wherein the first line buffer, the second line buffer in row buffering module The single port SRAM of 1KB size is all preferably used with third line buffer.First line buffer is used in the main control module Control under buffering ahb bus burst read input picture the first preset row pixel data;Second line buffer is used for The pixel data of the second preset row of the input picture that ahb bus burst is read is buffered under the control of the main control module; Third of the third line buffer for buffering the input picture that ahb bus burst is read under the control of the main control module is pre- Set capable pixel data.
Specifically, the described first preset row, the second preset row and the preset row of the third are ahb bus from the picture Happen suddenly three row serial numbers adjacent to each other in the input picture of reading in plain storage array, and picture corresponding to these three row serial numbers Prime number is updated according to after being read by ahb bus burst according to matrix convolution operation law.It should be understood that for involved in a convolution The data of different images row are input to 3 rows to the data parallel that can ensure to gradually output by the delay disposal of row delay Buffered unit can also be realized by synchronously different data pointers.By this processing, so that identical data It is multiplexed simultaneously by all processing units, improves data-reusing rate, simplify the control circuit design for reducing power consumption.
As a kind of mode that the utility model is implemented, as shown in Figure 1, in order to which the entire image that will input is in the convolution Complete process of convolution in computing module, setting first selector S1 in the displacement selection control module, second selector S2, the Three selector S3 and 3*3 convolution window control logics;Wherein: first selector S1, second selector S2 and third selector S3 All there are three input terminals, these three input terminals are respectively first input end 0, the second input terminal 1, third input terminal 2, first choice The first input end 0 of device S1 is connected with the output end of the third line buffer, the second input terminal 1 of first selector S1 with The output end of first line buffer is connected, the third input terminal 2 of first selector S1 and second line buffer Output end is connected;The first input end 0 of second selector S2 is connected with the output end of first line buffer, the second choosing The second input terminal 1 for selecting device S2 is connected with the output end of second line buffer, the third input terminal 2 of second selector S2 It is connected with the output end of the third line buffer;The first input end 0 of third selector S3 and second line buffer Output end be connected, the second input terminal 1 of third selector S3 is connected with the output end of the third line buffer, third The third input terminal 2 of selector S3 is connected with the output end of first line buffer.
3*3 convolution window control logic, including by the first shift register, the second shift register and third shift LD The 3*3 convolution window that device is constituted, wherein the first shift register, the second shift register and third shift register are all by three A register-combinatorial is constituted;As shown in Figure 1, corresponding first shift register of the first row in 3*3 convolution window, by register L32, register L31, register L30 are constituted;Second row corresponds to the second shift register, by register L22, register L21, posts Storage L20 is constituted;The third line corresponds to third shift register, is made of register L12, register L11, register L10.
As shown in Figure 1, the input terminal (input terminal of corresponding register L32) and first selector S1 of the first shift register Output end row0 be connected, for buffer first selector S1 selection enter 3*3 convolution window the first row pixel data;The The input terminal (input terminal of corresponding register L22) of two shift registers is connected with the output end row1 of second selector S2, uses Enter the pixel data of the second row of 3*3 convolution window in buffering second selector S2 selection;The input terminal of third shift register It is connected with the output end row2 of third selector S3, enters 3*3 convolution window the third line for buffering third selector S3 selection Pixel data;Wherein, an address input end phase of an address input end of first selector S1 and second selector S2 Connection, another address input end of second selector S2 are connected with an address input end of third selector S3, third Another address input end of selector S3 is connect with the main control module, for receiving the status signal.
The pixel data of the displacement selection control module output carries out the tool of multiplying with corresponding convolution Nuclear Data Body: 3*3 convolution window sliding process of the corresponding 3*3 window of 3*3 convolution kernel control logic in storage input image pixels data In, to complete the image array of input picture and the matrix convolution operation of convolution nuclear matrix, 3*3 convolution kernel control logic is corresponding The center of 3*3 window and the center of 3*3 convolution window all must be in windows overlay regions;When 3*3 convolution kernel control logic pair The relative position of the 3*3 window and 3*3 convolution window answered is as shown in fig. 6, the corresponding 3*3 window of 3*3 convolution kernel control logic exists When being slided by from left to right on input picture, the centre data P21 and 3*3 of the corresponding 3*3 window of 3*3 convolution kernel control logic The central registry L21 of convolution window is in windows overlay region, and corresponding first choosing of pixel data in windows overlay region Select the output end row1 of the output end row0 and second selector S2 of device S1.The corresponding 3*3 window of subsequent 3*3 convolution kernel control logic Mouth is turned right in sliding process, and the image pixel data that the displacement selection control module controls the row buffering module input carries out Rank transformation, windows overlay region can change, and the central registry L21 of centre data P21 and 3*3 convolution window can be participated in Convolution algorithm, data immobilize in 3*3 convolution kernel control logic;The corresponding 3*3 window of 3*3 convolution kernel control logic is completed past After right sliding, slide downward a line is further continued for sliding in the transverse direction of input picture, centre data P21 and 3*3 convolution window The central registry L21 of mouth, which can be confined, participates in convolution algorithm in windows overlay region;Wherein, implementation as shown in FIG. 6 The calculated result of convolutional calculation module described in example:
Y (0,0)=P32*0+P31*0+P30*0+P22*0+P21*L32+P20*L31+P12*0+P11*L22+P1 0*L21
=P21*L32+P20*L31+P11*L22+P10*L21.
As the utility model embodiment, as shown in Figure 2: the displacement selection control module further includes that edge filling is patrolled Volume, including filling selector, edge detection logic and pixel filling logic;Edge detection logic, respectively with pixel filling logic It is connected with filling selector, for being obtained by 3*3 convolution window at a pixel to be detected with the pixel to be detected Centered on point, the pixel value of 8 points of surrounding is compared further according to the calculated result and a preset threshold value of Sobel Operator structure Compared with to realize the judgement to the pixel to be detected in input picture address, when the value after calculating is greater than this threshold value, this is determined Pixel is edge.Then judging result signal is exported to pixel filling logic to control pixel filling operation, while will judgement Consequential signal is exported to the selection end of filling selector, to realize the control of the pixel data to write-in convolutional calculation module;
Selector S10 is filled, including fills input terminal and is not filled with input terminal, is used for according to edge detection logic to be checked The mode for surveying the judging result selection pixel data write-in convolutional calculation module of pixel, when the 3*3 convolution window control is patrolled The pixel data for collecting displacement output is determined as edge pixel by edge detection logic, then controls corresponding pixel data and pass through pixel Pass through the filling input terminal write-in convolutional calculation module for filling selector S10 after filling logical process, otherwise controls described 3*3 volumes Convolution meter is written in the input terminal that is not filled with that the pixel data of product window control logical shift output directly passes through filling selector S10 Calculate module.
Pixel filling logic, for treating the judging result of detection pixel point according to edge detection logic to described 3*3 volumes The pixel data of product window control logical shift output is symmetrically filled, and specifically, edge detection logic is first arranged and is determined Input picture boundary pixel point as symmetrical centre, by the pixel on the inside of the input picture where boundary pixel point about The boundary pixel point is symmetrically filled into the outside of the input picture where the boundary pixel point, so that 3*3 convolution The image array and convolution Nuclear Data completion planar convolution centered on boundary pixel point that window is confined.Then it exports To the filling input terminal of filling selector.Judgement reads the pixel data into 3*3 convolution window control logic in input picture Whether address is located at the boundary of input picture, and the pixel data at the edge in input picture is filled and is exported, and right Pixel data in the edge for being not in input picture then directly exports, so that the pixel data of the fringe region of input picture is joined With the window size being not only restricted to when convolution algorithm built in 3*3 convolution window control logic.
For the filling mode of edge pixel, one embodiment is proposed, as shown in figure 3, when the edge detection logic is examined When measuring the pixel a32 of the first row first row head of input picture and being in image boundary position, pixel filling logic is by pixel Point a32 is set as symmetrical centre, by pixel a31, a22 and a21 on the inside of the input picture where pixel a32 about this Pixel a32 carries out the outside that central symmetry is accordingly filled into input picture, i.e., the pixel a21 filling on the inside of input picture To the upper left side of pixel a32, the pixel a22 on the inside of input picture is filled into the surface of pixel a32, in input picture The pixel a31 of side is filled into the left side of pixel a31;Correspondingly, when the edge detection logic detects input picture When the pixel a31 of the first row secondary series head, the pixel a21 on the inside of input picture is filled into the surface of pixel a31; It, will be on the inside of input picture when the edge detection logic detects the pixel a22 of the second row first row head of input picture Pixel a21 be filled into the left side of pixel a22.
As the utility model embodiment, the main control module includes host state machine, the work shape of the host state machine State includes original state IDLE/0, the first row write state 1, the second row write state 2, the first row convolution state 6, for the first time for the first time Three row write states 5, the second row convolution state 7, the first row write state 3, the third line convolution state 8, the second row write state 4 and convolution End state 9.Under the driving of AHB interface state of a control machine, specific state transition operation such as Fig. 4 institute of the host state machine Show, when control enabling signal start sets high level, the first row write state 1 for the first time is transformed by original state IDLE/0.
In the first row for the first time write state 1, dashed forward for the first time when the first row reads and writes [0]=0 enable signal w_r by ahb bus Hair reads the pixel data of the first preset row, and is written in first line buffer;When the first row reads and writes enable signal When [0]=1 w_r, indicate that the pixel data of the first preset row is completely written in first line buffer, and enter second Otherwise row write state 2 for the first time are continually maintained in the first row for the first time in write state 1.
Under the second row for the first time write state 2, when the second row reads and writes [1]=0 enable signal w_r, control ahb bus is dashed forward for the first time Hair reads the pixel data of the second preset row, and is written in second line buffer;When the second row reads and writes enable signal When [1]=1 w_r, indicate that the pixel data of the second preset row is completely written in second line buffer, and enter first Otherwise row convolution state 6 is continually maintained in the second row for the first time in write state 2.
Under the first row convolution state 6, when the first row convolution enable signal [0]=0 c_r, it is slow to read out the first row The pixel data in device is rushed, and the pixel data read out displacement is written by the convolution meter according to column address enable signal col It calculates in module and carries out convolution algorithm;As column address enable signal col=0 and the first row convolution enable signal [0]=1 c_r, institute The pixel data stated in the first line buffer completes convolution algorithm but displacement selection control mould in the convolutional calculation module The shifting function for the pixel data that block reads ahb bus burst is simultaneously not finished, into the third line write state 5;When column address makes Can signal col=1 and when the first row convolution enable signal [0]=1 c_r, the pixel data in first line buffer is in institute The pixel data that convolutional calculation module completes convolution algorithm and the displacement selection control module reads ahb bus burst is stated to move Bit manipulation terminates, and terminates state 9 into convolution;Otherwise it is continually maintained in the first row convolution state 6.Wherein, ahb bus happens suddenly The pixel data of reading corresponds to the pixel data of the described first preset row and the pixel data of the second preset row.
Under the third line write state 5, the third line reads and writes enable signal w_r [1]=0 or column address enable signal col=0 When, the pixel data of the preset row of third is read in ahb bus burst, and is written in the third line buffer;Work as column address When enable signal col=1 or the third line read and write [1]=1 enable signal w_r, indicate that the pixel data of the preset row of the third is complete It is written in the third line buffer entirely, while the pixel data in second line buffer starts that the convolutional calculation is written Module carries out convolution algorithm, and enters the second row convolution state 7, is otherwise continually maintained in the second row for the first time in write state 2.
Under the second row convolution state 7, it is slow that second row is read out in the second row convolution enable signal c_r [1]=0 control The pixel data in device is rushed, and the pixel data read out displacement is written by the convolution meter according to column address enable signal col It calculates in module and carries out convolution algorithm;As column address enable signal col=0 and the second row convolution enable signal [1]=1 c_r, institute The pixel data stated in the second line buffer completes convolution algorithm but displacement selection control mould in the convolutional calculation module The shifting function for the pixel data that block reads ahb bus burst is simultaneously not finished, into the first row write state 3;When column address makes Can signal col=1 and when the first row convolution enable signal [1]=1 c_r, the pixel data in first line buffer is in institute The pixel data that convolutional calculation module completes convolution algorithm and the displacement selection control module reads ahb bus burst is stated to move Bit manipulation terminates, and terminates state 9 into convolution;Otherwise it is continually maintained in the second row convolution state 7.
Under the first row write state 3, when the first row reads and writes [0]=0 enable signal w_r, control ahb bus burst is read more The pixel data of the described first preset row after new, and being written in first line buffer, at this time first line buffer Interior data are read away in the first row convolution state 6;When column address enable signal col=1 or the first row read-write make When energy signal [0]=1 w_r, indicate that the pixel data of the described first preset row updated is completely written to first line buffer In, while the pixel data in the third line buffer starts that the convolutional calculation module progress convolution algorithm is written, and goes forward side by side Enter the third line convolution state 8, is otherwise continually maintained in the first row write state 3.
Under the third line convolution state 8, it is slow that described the third line is read out in the third line convolution enable signal c_r [2]=0 control The pixel data in device is rushed, and the pixel data read out displacement is written by the convolution meter according to column address enable signal col It calculates in module and carries out convolution algorithm;As column address enable signal col=0 and the third line convolution enable signal [2]=1 c_r, institute The pixel data stated in third line buffer completes convolution algorithm but displacement selection control mould in the convolutional calculation module The shifting function for the pixel data that block reads ahb bus burst is simultaneously not finished, into the second row write state 4;When column address makes Can signal col=1 and when the third line convolution enable signal [2]=1 c_r, the pixel data in the third line buffer is in institute The pixel data that convolutional calculation module completes convolution algorithm and the displacement selection control module reads ahb bus burst is stated to move Bit manipulation terminates, and terminates state 9 into convolution;Otherwise it is continually maintained in the third line convolution state 8.
Under the second row write state 4, when the second row reads and writes [1]=0 enable signal w_r, control ahb bus burst is read more The pixel data of the described second preset row after new, and be written in second line buffer;Second line buffer at this time Interior data are read away in the second row convolution state 7;When column address enable signal col=1 or the read-write of the second row make When energy signal [1]=1 w_r, indicate that the pixel data of the described second preset row updated is completely written to second line buffer In, while the pixel data in first line buffer starts that the convolutional calculation module progress convolution algorithm is written, and goes forward side by side Enter the first row convolution state 6, is otherwise continually maintained in the second row write state 4.
Convolution terminates in state 9, and the data of the convolution results FIFO are write back institute by control ahb bus by host state machine State convolution results storage array;When AHB interface ready signal hready sets high level, state transition returns original state IDLE/ 0.Then continue the new one-row pixels data that the input picture is read in burst from the pixel storage array, continue duplicate The processing pixel data that above-mentioned state conversion process newly inputs.
Specifically, host state machine further includes ring counter, for generating the working condition of the corresponding host state machine The status signal;Under the control of the status signal and the column address enable signal, current line pixel data is described After convolutional calculation module carries out convolution algorithm, next line pixel data is written column-wise as the convolution according to the column address signal Computing module, and row pixel data to be processed in the pixel storage array continues to be happened suddenly by ahb bus described in reading entrance Empty line buffer in row buffering module, so that host state machine is in the first row write state 3,4 and of the second row write state It is recycled under described these three working conditions of the third line write state 5, traversal reads the input figure from the pixel storage array As the pixel data that do not go together, until institute is all written in the pixel data of all rows of input picture in the pixel storage array It states in convolutional calculation module and completes convolution algorithm.Wherein the count value of ring counter respectively correspond the first row write state 3, The second row write state 4 and the third line write state 5, that is, generate the status signal.
Main control module further includes convolution algorithm Read-write Catrol state machine, for being in the first row volume in host state machine The lower carry out state conversion of cumuliformis state 6, the second row convolution state 7 or the third line convolution state 8, to realize the 3* 3 convolution window control logics select the output end row0 of first selector S1, the output end row1 of second selector S2 and third The data parallel for selecting the output end row2 of device S3 is read, and shifts convolutional calculation, and convolutional calculation result is written to the convolution As a result in FIFO, then data are read from the convolution results FIFO under the driving of the AHB interface state of a control machine Come and is written out on AHB interface.Wherein, convolutional calculation was completed within 3 clock cycle, so that the operation of main control module Timing is abundant.
As the utility model embodiment, the working condition of the convolution algorithm Read-write Catrol state machine includes: initial shape State IDLE, for the first time read states RD_ROWO, row buffering state RD_BUF is read, is shifted and is write fifo status SHFT, writes FIFO waiting shape State SHFT_WAIT, write bus state BWR, write bus wait state BWR_WAIT and write complete state BWR_END;Convolution algorithm The concrete operations of Read-write Catrol state machine are as shown in Figure 5:
When the host state machine is in the first row convolution state 6, the second row convolution state 7 or described the third line Start to control convolution algorithm when convolution state 8, the convolution algorithm Read-write Catrol state machine jumps to head by original state IDLE Secondary read states RD_ROWO reads the pixel data of first row in first line buffer at read states RD_ROWO for the first time, Enter the 3*3 convolution window by displacement selection control module selection, displacement selects control module during this Interior shift register starts to execute to pixel data changes column operation, then state transition to reading row buffering state RD_BUF.
In the case where reading row buffering state RD_BUF, the displacement selection control module is read in the row buffering module in addition to institute It states the pixel data outside the first row of the first line buffer and is displaced into the 3*3 convolution window by column, for the first time read states RD_ The corresponding array of registers serial number of the data being read and stored in the 3*3 convolution window under ROWO changes, until described Displacement selection control module has read current line image pixel data in the row buffering module, then the displacement selection control Module executes line feed operation, reads the next line pixel data of input picture, state transition to displacement writes fifo status SHFT.
Since convolution algorithm needs convolution Nuclear Data to carry out operation in image data sliding window, it is therefore desirable to logic with shift Corresponding shifting function is carried out to image data, logic with shift reads current convolution algorithm sequence number.The utility model embodiment It is middle that the pixel data that the status signal selects input picture corresponding line in the row buffering module is exported according to main control module, Each column pixel data parallel shift is cached again, so that the pixel data of write-in convolutional calculation module is in convolutional calculation module Window sliding during complete convolution;When determining convolution algorithm, deposited in the window built in the displacement selection control module Store up data determine to convolution pixel data two dimensions in two-dimensional surface direction size (i.e. pixel is horizontal and vertical), According to the size of the two dimensions using it is described displacement selection control module change image data set at sequence order (enter a new line With change column), and according to the main control module export the status signal determine image data sequence displacement number so that Enter to the pixel data and convolution kernel alignment of data of a line image of each convolution algorithm of the convolutional calculation module.
It is right when the count value of the shift counter wa_cout of generation is not equal to 3 in the case where fifo status SHFT is write in displacement Pixel data shifts by column in the 3*3 convolution window is transferred to the convolutional calculation module, until that will correspond to input picture The convolutional calculation module, which is written, in one-row pixels data just can be carried out convolution algorithm;When the counting of the read counter r_cout of generation Count value of the value not equal to 0, shift counter wa_cout is equal to the data depth D_F of 3 and the write-in convolution results FIFO When writing length B_L less than burst, the convolutional calculation module is completed a convolution algorithm but writes from the convolutional calculation module The number for entering the convolution results of the convolution results FIFO is less than burst and writes length B_L, while the displacement of the 3*3 convolution window The current not stored complete a line image pixel data of register, then return and read row buffering state RD_BUF for the image pixel number of reading According to continuing to be displaced by column in the 3*3 convolution window, primary line feed operation then is executed to the input picture;When reading is counted When count value of the count value equal to 0 and shift counter wa_cout of number device r_cout is equal to 3, the 3*3 convolution window Shift register has traversed all pixels data in the input picture, and the volume is written in corresponding input image pixels data entirely Product computing module carries out convolution algorithm, then state transition is to writing FIFO wait state SHFT_WAIT;As the read counter r_ of generation The count value that the count value of cout is equal to 0, shift counter wa_cout is equal to the data of 3 and the write-in convolution results FIFO When depth D_F writes length B_L more than or equal to burst, the convolutional calculation module is written entirely and carries out for the pixel data of input picture Convolution algorithm, while being greater than or equal to from the number that the convolution results of the convolution results FIFO are written in the convolutional calculation module Length B_L is write in burst, then state transition to write bus state BWR.Displacement is write under fifo status SHFT to the pixel inputted parallel Data carry out convolution algorithm, and enter a new line after completing a convolution algorithm to the input picture, then carry out next secondary volume Product operation, until the count value of shift counter wa_cout reaches 3.
In the case where writing FIFO wait state SHFT_WAIT, the count value of shift counter wa_cout is not equal to 3, no longer gives institute It states convolution results FIFO module and new convolution results is provided;When all calculated result current in the convolutional calculation module all The convolution results fifo module is written, and the storage depth D_F of the convolution results fifo module is greater than or equal to the AHB When data length B_L is write in the burst that bus is configured, state transition to write bus state BWR.
At write bus state BWR, when flag bit B_W=1, the count value for writing counter wr_cout of generation are write in burst Non-zero, and the count value of read counter r_cout be equal to 0 when, the convolution results FIFO is completed a burst and writes length The operation of ahb bus is written in data, but according in the non-zero count for the writing counter wr_cout convolutional calculation module Calculated result be not completely written to the convolution results fifo module, then state transition is to writing FIFO wait state SHFT_WAIT; When flag bit B_W=1, the count value non-zero for writing counter wr_cout of generation, and the counting of read counter r_cout are write in burst When being worth non-zero 0, input image pixels data are not written the convolutional calculation module entirely and carry out convolution algorithm, then state transition to reading Row buffering state RD_BUF;As the count value wr_cout for writing counter wr_cout that burst is write flag bit B_W=1 and generated When=0, the convolution results fifo module be completed one happen suddenly write length data be written ahb bus on the basis of, will Remaining convolution results are writen to ahb bus, state transition to write bus wait state in the convolution results fifo module BWR_WAIT;Wherein each described convolution results corresponds to pixel data currently stored in the 3*3 convolution window;When described Convolution results FIFO rests on write bus shape when not completing the operation for the data write-in ahb bus that length is write in a complete burst Under state BWR.
At write bus wait state BWR_WAIT, when write counter wr_cout=0, read counter r_cout=0 or When AHB interface ready signal hready=0, ahb bus is all written in the convolution results in the convolution results fifo module, State transition writes complete state BWR_END, otherwise rests on write bus wait state BWR_WAIT.When AHB interface ready signal When hready=1, the state BWR_END of writing complete jumps back to original state IDLE, and the pixel data of the input picture is total in AHB The operation of writing out of line is completed.
Specifically, the pixel data that the count value of the shift counter wa_cout is stored as the 3*3 convolution window The line feed offset of the convolutional calculation module is written;The count value of the read counter r_cout, as the 3*3 convolution window The line skew amount that the pixel data of mouth storage is generated in parallel shift.The shift counter wa_cout and the reading count The change in count value of device r_cout is exported as the column address signal to the row buffering module so that the convolution algorithm is read State of a control machine is write to control the displacement selection control module completion to the line feed of the pixel data of input picture and change column read-write.
AHB interface state of a control machine, including original state, discontinuous transmission state and state is continuously transmitted, for driving Data on the host state machine and convolution algorithm Read-write Catrol state machine read-write ahb bus, reflect the master control molding Block controls the state of the burst read-write of ahb bus, and determines its state transition by each working condition in the host state machine Condition is to realize the burst transfers of the data on ahb bus.The AHB Interface Controller state machine is skipped to by the original state The discontinuous transmission state is indicated in the wherein one-row pixels data for currently carrying out the input picture under the burst read operation In first data transmission, the main control module by ahb bus happen suddenly from the pixel storage array reading input The pixel data of the current adjacent rows of image is to realize the convolution algorithm in the 3*3 convolution window sliding process;The AHB connects Mouthful state of a control machine by the discontinuous transmission state skip to it is described continuously transmit in state procedure, the main control module passes through By the pixel data of current adjacent rows, the convolution results storage battle array is written in the operation result in convolutional calculation module to ahb bus Column;The AHB interface state of a control machine continuously transmits the master control molding during state skips to the original state by described The burst reading of the ahb bus of block control and burst write operations terminate, and then AHB interface ready signal hready sets high level, institute It states main control module and starts new burst transfer.
The common knowledge for partly belonging to those skilled in the art is not described in detail in the utility model specification.And with Upper described device embodiments are only schematical, wherein the unit as illustrated by the separation member can be or It may not be and be physically separated, component shown as a unit may or may not be physical unit, it can It is in one place, or may be distributed over multiple network units.Part therein can be selected according to the actual needs Or whole modules realize the purpose of present embodiment scheme.Those of ordinary skill in the art are not making the creative labor In the case of, it can it understands and implements.

Claims (8)

1. a kind of accelerating circuit of 3*3 convolution algorithm, which includes, for storing input picture and image convolution operation As a result DDR module and the convolution results fifo module for buffering convolution algorithm result, wherein DDR module includes that configuration is defeated Enter the base address of image and the pixel storage array of memory space, and configures the volume of the destination address of image convolution algorithm result Product result storage array;It is characterized in that, the accelerating circuit includes main control module, displacement selection control module, row buffering mould Block and convolutional calculation module;
Main control module reads input picture current adjacent two for happening suddenly from the pixel storage array by ahb bus Capable pixel data, and the parallel shift for controlling pixel data in displacement selection control module makes that convolutional calculation mould is written every time The pixel data of block and matched convolution kernel alignment of data are to accelerate calculating process, then pass through ahb bus for current adjacent rows Pixel data in convolutional calculation module operation result the convolution results storage array is written;Main control module is also used to The pixel data of the next line based on current adjacent rows in input picture is read by ahb bus, and is shifted accordingly With convolution algorithm processing, issues to interrupt after the completion of the data processing of input picture all pixels and inform the processing of CPU convolution algorithm Result to reduce software instruction expense;Wherein, convolution Nuclear Data is CPU preconfigured convolution window in convolutional calculation module The data stored in mouthful;The pixel data and matched convolution kernel alignment of data of each write-in convolutional calculation module are to be Realize convolution algorithm, the pixel data institute in the convolutional calculation module being written of the convolution window where control convolution Nuclear Data is right On the image array answered in sliding process, the overlapping region comprising the convolution window center where convolution Nuclear Data is formed;
Row buffering module, including the line buffer that shift register is constituted, for being buffered according to image horizontal pixel data length The pixel data of input picture corresponding line, and the column address signal and status signal that are generated according to main control module input buffering Pixel data in image exports to displacement and selects control module;
Displacement selection control module, the status signal for being exported according to main control module select to input in row buffering module The pixel data of image corresponding line, then it is filled processing by each column pixel data parallel shift, and to pixel data, so that Convolution algorithm is completed in all pixels data write-in convolutional calculation module in input picture;
Convolutional calculation module, for multiplying the pixel data of displacement selection control module output with corresponding convolution Nuclear Data Method operation, based on adder group by the results added of the multiplying to realize convolution algorithm.
2. accelerating circuit according to claim 1, which is characterized in that the line buffer of the row buffering module includes the first row Buffer, the second line buffer and third line buffer, for by column address select end and state selection end respectively with it is described Main control module is connected in parallel, in which: the first line buffer is used to buffer ahb bus under the control of the main control module prominent Send out the pixel data of the first preset row of the input picture read;Second line buffer is used for the control in the main control module The pixel data of second preset row of the input picture that lower buffering ahb bus burst is read;Third line buffer is used for described The pixel data of the preset row of third for the input picture that ahb bus burst is read is buffered under the control of main control module;Described One preset row, the second preset row and the preset row of the third are that ahb bus happens suddenly reading from the pixel storage array Input picture in adjacent to each other three row serial numbers, and pixel data corresponding to these three row serial numbers is happened suddenly by ahb bus It is updated after reading according to matrix convolution operation law.
3. accelerating circuit according to claim 2, which is characterized in that the displacement selection control module includes first choice Device, second selector, third selector and 3*3 convolution window control logic;
All there are three input terminals for first selector, second selector and third selector, these three input terminals are respectively first defeated Enter end, the second input terminal, third input terminal, wherein the first input end of first selector is defeated with the third line buffer Outlet is connected, and the second input terminal of first selector is connected with the output end of first line buffer, first selector Third input terminal be connected with the output end of second line buffer;
The first input end of second selector is connected with the output end of first line buffer, and the second of second selector is defeated Enter end to be connected with the output end of second line buffer, the third input terminal of second selector and the third line buffer Output end be connected;
The first input end of third selector is connected with the output end of second line buffer, and the second of third selector is defeated Enter end to be connected with the output end of the third line buffer, the third input terminal of third selector and first line buffer Output end be connected;
3*3 convolution window control logic, including by the first shift register, the second shift register and third shift register structure At 3*3 convolution window, wherein the first shift register, the second shift register and third shift register are all posted by three Storage combination is constituted;The input terminal of first shift register is connected with the output end of first selector, for buffering first choice Device selection enters the pixel data of 3*3 convolution window the first row;The input terminal of second shift register and second selector it is defeated Outlet is connected, and the pixel data of the second row of 3*3 convolution window is entered for buffering second selector selection;Third shift register Input terminal be connected with the output end of third selector, for buffer third selector selection enter 3*3 convolution window the third line Pixel data;
Wherein, an address input end of first selector is connected with an address input end of second selector, the second choosing Another address input end for selecting device is connected with an address input end of third selector, another ground of third selector Location input terminal is connect with the main control module, for receiving the status signal.
4. accelerating circuit according to claim 3, which is characterized in that the displacement selection control module further includes edge filling Logic, including filling selector, edge detection logic and pixel filling logic;
Edge detection logic is connected, for determining the 3*3 convolution window respectively with pixel filling logic and filling selector Interior pixel to be detected is exported in the address location of input picture, and by judging result signal to pixel filling logic and filling The selection end of selector;
Selector is filled, including fills input terminal and is not filled with input terminal, for selecting the received edge detection in end to patrol according to it Convolutional calculation module is written in the pixel data for the judging result signal selection respective input collected;
Pixel filling logic, for the judging result signal according to edge detection logic to the 3*3 convolution window control logic Displacement output pixel data symmetrically filled so that the 3*3 convolution window confined centered on boundary pixel point Image array and the convolution Nuclear Data complete planar convolution, and export to fill selector filling input terminal.
5. accelerating circuit according to claim 2, which is characterized in that the main control module includes host state machine, the master The working condition of state machine includes the first row write state, the second row write state, the first row convolution state, third row write for the first time for the first time State, the second row convolution state, the first row write state, the third line convolution state and the second row write state;
Host state machine, for reading and writing enable signal control ahb bus by the first row and dashing forward for the first time under the first row for the first time write state Hair reads the pixel data of the first preset row, and is written in first line buffer;
Under the second row for the first time write state, enable signal is read and write by the second row controls ahb bus and happen suddenly for the first time reading described the The pixel data of two preset rows, and be written in second line buffer;
Under the first row convolution state, the picture in first line buffer is read out by the control of the first row convolution enable signal Prime number evidence, and the pixel data read out displacement is written in the convolutional calculation module according to column address enable signal and is rolled up Product operation;
Under the third line write state, enable signal is read and write by the third line and controls the ahb bus burst reading preset row of third Pixel data, and be written in the third line buffer;
Under the second row convolution state, the picture in second line buffer is read out by the control of the second row convolution enable signal Prime number evidence, and the pixel data read out displacement is written in the convolutional calculation module according to column address enable signal and is rolled up Product operation;
Under the first row write state, enable signal is read and write by the first row controls ahb bus burst and read updated described the The pixel data of one preset row, and be written in first line buffer;
Under the third line convolution state, the picture in the third line buffer is read out by the control of the third line convolution enable signal Prime number evidence, and the pixel data read out displacement is written in the convolutional calculation module according to column address enable signal and is rolled up Product operation;
Under the second row write state, enable signal is read and write by the second row controls ahb bus burst and read updated described the The pixel data of two preset rows, and be written in second line buffer;
Host state machine further includes ring counter, the state letter of the working condition for generating the corresponding host state machine Number;
Wherein, under the control of the status signal, current line pixel data carries out convolution algorithm in the convolutional calculation module Later, next line pixel data is written column-wise as the convolutional calculation module, and row picture to be processed in the pixel storage array Prime number is read according to continuing to be happened suddenly by ahb bus into the empty line buffer, so that the first row write state, described second The pixel data read under row write state and described these three working conditions of the third line write state is constantly updated, until the pixel The pixel data of all rows of input picture, which is all written in the convolutional calculation module, in storage array completes convolution algorithm.
6. accelerating circuit according to claim 5, which is characterized in that main control module further includes convolution algorithm Read-write Catrol shape State machine, for being in the first row convolution state, the second row convolution state or the third line convolution in host state machine State conversion is carried out under state;
The working condition of convolution algorithm Read-write Catrol state machine includes: that FIFO shape is write in read states, reading row buffering state, displacement for the first time State writes FIFO wait state, write bus state and write bus wait state;
Convolution algorithm Read-write Catrol state machine, for reading the picture of first row in first line buffer under read states for the first time Prime number enters the 3*3 convolution window according to selection;
The picture in the row buffering module other than the first row of first line buffer is read in the case where reading row buffering state Prime number enters the 3*3 convolution window according to selection;
In the case where fifo status is write in displacement, according to the count value of the shift counter of generation to pixel number in the 3*3 convolution window The convolutional calculation module is transferred to according to displacement by column and makees convolution algorithm, and according to the count value of the read counter of generation to described The line feed displacement of 3*3 convolution window is transferred to the convolutional calculation module and makees convolution algorithm;
In the case where writing FIFO wait state, the convolution results fifo module is written into the calculated result of the convolutional calculation module, Until the storage depth of the convolution results fifo module is greater than or equal to the burst that is configured of the ahb bus, to write data long Degree;
Under write bus state, the convolution that is stored the convolution results fifo module according to the count value for writing counter of generation As a result it is written on ahb bus, until the pixel data of all row and columns of the input picture in the pixel storage array participates in The convolution results fifo module was all written in the result of convolutional calculation;
Under write bus wait state, according to the count value for writing counter or read counter, determine in the pixel storage array Input picture all row and columns pixel data participate in convolutional calculation result be written ahb bus;
Wherein, the convolution meter is written in the pixel data that the shift counter count value is stored as the 3*3 convolution window Calculate the row serial number of module;The count value of the read counter is being moved parallel as the pixel data that the 3*3 convolution window stores The column serial number generated when position.
7. accelerating circuit according to claim 6, which is characterized in that the counting of the shift counter and the read counter Changing value is exported as the column address signal to the row buffering module.
8. accelerating circuit according to claim 7, which is characterized in that main control module further includes AHB interface state of a control machine, For driving the host state machine and the convolution algorithm Read-write Catrol state machine to read and write the data on ahb bus, and by described Each working condition in host state machine determines its state transition condition to realize the burst transfer of the data on ahb bus.
CN201821189844.5U 2018-07-26 2018-07-26 The accelerating circuit of 3*3 convolution algorithm Withdrawn - After Issue CN208766715U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201821189844.5U CN208766715U (en) 2018-07-26 2018-07-26 The accelerating circuit of 3*3 convolution algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201821189844.5U CN208766715U (en) 2018-07-26 2018-07-26 The accelerating circuit of 3*3 convolution algorithm

Publications (1)

Publication Number Publication Date
CN208766715U true CN208766715U (en) 2019-04-19

Family

ID=66129396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201821189844.5U Withdrawn - After Issue CN208766715U (en) 2018-07-26 2018-07-26 The accelerating circuit of 3*3 convolution algorithm

Country Status (1)

Country Link
CN (1) CN208766715U (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681984A (en) * 2018-07-26 2018-10-19 珠海市微半导体有限公司 A kind of accelerating circuit of 3*3 convolution algorithms
CN110222818A (en) * 2019-05-13 2019-09-10 西安交通大学 A kind of more bank ranks intertexture reading/writing methods for the storage of convolutional neural networks data
CN110647978A (en) * 2019-09-05 2020-01-03 北京三快在线科技有限公司 System and method for extracting convolution window in convolution neural network
CN111080507A (en) * 2019-11-18 2020-04-28 中国航空工业集团公司西安航空计算技术研究所 TLM microstructure for GPU hardware image processing convolution filtering system
CN111679286A (en) * 2020-05-12 2020-09-18 珠海市一微半导体有限公司 Laser positioning system and chip based on hardware acceleration
CN113489925A (en) * 2021-06-01 2021-10-08 中国科学院上海技术物理研究所 Focal plane detector reading circuit for realizing convolution calculation

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681984A (en) * 2018-07-26 2018-10-19 珠海市微半导体有限公司 A kind of accelerating circuit of 3*3 convolution algorithms
CN108681984B (en) * 2018-07-26 2023-08-15 珠海一微半导体股份有限公司 Acceleration circuit of 3*3 convolution algorithm
CN110222818A (en) * 2019-05-13 2019-09-10 西安交通大学 A kind of more bank ranks intertexture reading/writing methods for the storage of convolutional neural networks data
CN110647978A (en) * 2019-09-05 2020-01-03 北京三快在线科技有限公司 System and method for extracting convolution window in convolution neural network
CN111080507A (en) * 2019-11-18 2020-04-28 中国航空工业集团公司西安航空计算技术研究所 TLM microstructure for GPU hardware image processing convolution filtering system
CN111080507B (en) * 2019-11-18 2022-12-06 中国航空工业集团公司西安航空计算技术研究所 TLM microstructure for GPU hardware image processing convolution filtering system
CN111679286A (en) * 2020-05-12 2020-09-18 珠海市一微半导体有限公司 Laser positioning system and chip based on hardware acceleration
CN111679286B (en) * 2020-05-12 2022-10-14 珠海一微半导体股份有限公司 Laser positioning system and chip based on hardware acceleration
CN113489925A (en) * 2021-06-01 2021-10-08 中国科学院上海技术物理研究所 Focal plane detector reading circuit for realizing convolution calculation
CN113489925B (en) * 2021-06-01 2022-07-08 中国科学院上海技术物理研究所 Focal plane detector reading circuit for realizing convolution calculation

Similar Documents

Publication Publication Date Title
CN208766715U (en) The accelerating circuit of 3*3 convolution algorithm
CN108681984A (en) A kind of accelerating circuit of 3*3 convolution algorithms
CN102208005B (en) 2-dimensional (2-D) convolver
US20180189643A1 (en) Convolution circuit, application processor including the same, and operating method thereof
CN101441271B (en) SAR real time imaging processing device based on GPU
CN103020890B (en) Based on the visual processing apparatus of multi-level parallel processing
CN111414994B (en) FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
CN107392309A (en) A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA
CN106683158A (en) Modeling structure of GPU texture mapping non-blocking memory Cache
RU2623806C1 (en) Method and device of processing stereo images
GB2298111A (en) Improvements relating to computer 3d rendering systems
CN107748723A (en) Storage method and access device supporting conflict-free stepping block-by-block access
CN105550978B (en) A kind of GPU 3D engine on piece memory hierarchy towards unified dyeing framework
CN114092338B (en) Image zooming fast calculation method
CN114461978A (en) Data processing method and device, electronic equipment and readable storage medium
CN114359662A (en) Implementation method of convolutional neural network based on heterogeneous FPGA and fusion multiresolution
CN109446478A (en) A kind of complex covariance matrix computing system based on iteration and restructural mode
CN110515872A (en) Direct memory access method, apparatus, dedicated computing chip and heterogeneous computing system
CN104869284A (en) High-efficiency FPGA implementation method and device for bilinear interpolation amplification algorithm
CN117217274A (en) Vector processor, neural network accelerator, chip and electronic equipment
CN111275608B (en) Remote sensing image orthorectification parallel system based on FPGA
CN1105358C (en) Semiconductor memory having arithmetic function, and processor using the same
CN107679117A (en) A kind of whole audience dense point Rapid matching system
WO2021070303A1 (en) Computation processing device
CN116090530A (en) Systolic array structure and method capable of configuring convolution kernel size and parallel calculation number

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant
AV01 Patent right actively abandoned

Granted publication date: 20190419

Effective date of abandoning: 20230815

AV01 Patent right actively abandoned

Granted publication date: 20190419

Effective date of abandoning: 20230815

AV01 Patent right actively abandoned
AV01 Patent right actively abandoned