CN208766715U - The accelerating circuit of 3*3 convolution algorithm - Google Patents
The accelerating circuit of 3*3 convolution algorithm Download PDFInfo
- Publication number
- CN208766715U CN208766715U CN201821189844.5U CN201821189844U CN208766715U CN 208766715 U CN208766715 U CN 208766715U CN 201821189844 U CN201821189844 U CN 201821189844U CN 208766715 U CN208766715 U CN 208766715U
- Authority
- CN
- China
- Prior art keywords
- convolution
- row
- pixel data
- state
- write
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn - After Issue
Links
Abstract
The utility model discloses a kind of accelerating circuit of 3*3 convolution algorithm, including DDR module, convolution results fifo module, host state machine control module, displacement selection control module, row buffering module and convolutional calculation module.Main control module is happened suddenly from the pixel storage array by ahb bus interface reads the pixel data of the current adjacent rows of input picture, and the parallel shift for controlling pixel data in displacement selection control module makes the pixel data that convolutional calculation module is written every time carry out convolution algorithm with corresponding convolution Nuclear Data, by ahb bus, by the pixel data of current adjacent rows, the convolution results storage array is written in the operation result in convolutional calculation module again, then the pixel data of next line is read by ahb bus, and carry out corresponding displacement and convolution algorithm processing, it is issued after the completion of the data processing of input picture all pixels and interrupts the result for informing the processing of CPU convolution algorithm to reduce software instruction expense.
Description
Technical field
The utility model relates to mechanical vision inspection technology field more particularly to the accelerating circuits of 3*3 convolution algorithm.
Background technique
Currently, sweeper needs to calculate by a set of image procossing when building figure and positioning using video image progress map
Method pre-processes camera acquired image data, such as image filtering, the elimination of picture noise, and the feature of image increases
By force, the smoothing processing, etc. of image.
In existing technical field of machine vision, window processing is a kind of processing common in image procossing, its thought is pair
In image array, arithmetic operation is carried out to image by the minor matrix of a fixed size (such as 3*3).Common window processing packet
Include morphological operation, fuzzy filter, gaussian filtering etc..Wherein, convolution algorithm is widely used.But convolution algorithm needs
Software reads in image data, caches, and calculates, then writes out, and needs to consume a large amount of software instruction, occupies a large amount of soft
Part resource, leads to inefficiency.
Utility model content
In order to save software resource, instruction efficiency is improved, the utility model realizes base by the way of specific integrated circuit
In the fast convolution algorithm of the sliding window of 3*3 size, it is able to achieve hardware concurrent and pipelining in design, may be implemented
The acceleration of algorithm, its technical solution is as follows:
A kind of accelerating circuit of 3*3 convolution algorithm, the accelerating circuit include, for storing input picture and image convolution fortune
Calculate the DDR module of result and the convolution results fifo module for buffering convolution algorithm result, wherein DDR module includes configuration
The base address of input picture and the pixel storage array of memory space, and configure the destination address of image convolution algorithm result
Convolution results storage array;The accelerating circuit includes main control module, displacement selection control module, row buffering module and convolution meter
Calculate module;Main control module, being used to happen suddenly from the pixel storage array by ahb bus, it is currently adjacent to read input picture
The pixel data of two rows, and the parallel shift for controlling pixel data in displacement selection control module makes that convolutional calculation is written every time
The pixel data of module accelerates calculating process with matched convolution kernel alignment of data, then incites somebody to action current adjacent two by ahb bus
The convolution results storage array is written in capable pixel data operation result in convolutional calculation module;Main control module is also used
The pixel data of the next line based on current adjacent rows in through ahb bus reading input picture, and moved accordingly
Position and convolution algorithm processing are issued to interrupt after the completion of the data processing of input picture all pixels and be informed at CPU convolution algorithm
The result of reason is to reduce software instruction expense;Wherein, convolution Nuclear Data is CPU preconfigured volume in convolutional calculation module
The data stored in product window;The pixel data and matched convolution kernel alignment of data of each write-in convolutional calculation module
Be, for realize convolution algorithm, control convolution Nuclear Data where convolution window in the convolutional calculation module being written pixel number
According in sliding process on corresponding image array, the overlapping comprising the convolution window center where convolution Nuclear Data is formed
Region;Row buffering module, including the line buffer that shift register is constituted, for being buffered according to image horizontal pixel data length
The pixel data of input picture corresponding line, and the column address signal and status signal that are generated according to main control module input buffering
Pixel data in image exports to displacement and selects control module;Displacement selection control module, for defeated according to main control module
The pixel data of input picture corresponding line in status signal selection row buffering module out, then simultaneously by each column pixel data
Row displacement, and processing is filled to pixel data, so that in input picture in all pixels data write-in convolutional calculation module
Complete convolution algorithm;Convolutional calculation module selects the pixel data of control module output and corresponding convolution kernel for that will shift
Data carry out multiplying, based on adder group by the results added of the multiplying to realize convolution algorithm.
Further, the line buffer of the row buffering module includes the first line buffer, the second line buffer and third
Line buffer, for selecting end and state that end is selected to be connected in parallel respectively with the main control module by column address, in which: the
The first of input picture of one line buffer for buffering ahb bus burst reading under the control of the main control module is preset
Capable pixel data;Second line buffer under the control of the main control module for buffering the defeated of ahb bus burst reading
Enter the pixel data of the second preset row of image;Third line buffer is for buffering AHB under the control of the main control module
The pixel data of the preset row of third for the input picture that bus burst is read;The first preset row, the second preset row and
The preset row of third is adjacent to each other from the input picture for the reading that happens suddenly in the pixel storage array three of ahb bus
Row serial number, and pixel data corresponding to these three row serial numbers is happened suddenly after reading by ahb bus according to matrix convolution operation law
It updates.
Further, the displacement selection control module includes first selector, second selector, third selector and 3*
3 convolution window control logics;All there are three input terminal, these three inputs for first selector, second selector and third selector
End is respectively first input end, the second input terminal, third input terminal, wherein the first input end of first selector and described the
The output end of three line buffers is connected, and the second input terminal of first selector is connected with the output end of first line buffer
It connects, the third input terminal of first selector is connected with the output end of second line buffer;The first of second selector is defeated
Enter end to be connected with the output end of first line buffer, the second input terminal of second selector and second line buffer
Output end be connected, the third input terminal of second selector is connected with the output end of the third line buffer;Third choosing
The first input end for selecting device is connected with the output end of second line buffer, the second input terminal of third selector with it is described
The output end of third line buffer is connected, the output end phase of the third input terminal of third selector and first line buffer
Connection;3*3 convolution window control logic, including by the first shift register, the second shift register and third shift register
The 3*3 convolution window of composition, wherein the first shift register, the second shift register and third shift register are all by three
Register-combinatorial is constituted;The input terminal of first shift register is connected with the output end of first selector, for buffering the first choosing
Select the pixel data that device selection enters 3*3 convolution window the first row;The input terminal of second shift register and second selector
Output end is connected, and the pixel data of the second row of 3*3 convolution window is entered for buffering second selector selection;Third shift LD
The input terminal of device is connected with the output end of third selector, enters 3*3 convolution window third for buffering third selector selection
Capable pixel data;Wherein, an address input end of first selector is connected with an address input end of second selector
It connects, another address input end of second selector is connected with an address input end of third selector, third selector
Another address input end connect with the main control module, for receiving the status signal.
Further, the displacement selection control module further includes edge filling logic, including filling selector, edge inspection
Survey logic and pixel filling logic;Edge detection logic is connected, for sentencing respectively with pixel filling logic and filling selector
Determine the pixel to be detected in 3*3 convolution window to export in the address location of input picture, and by judging result signal to pixel
It fills logic and fills the selection end of selector;Selector is filled, including fills input terminal and is not filled with input terminal, is used for basis
It selects the pixel data of the judging result signal selection respective input of the received edge detection logic in end that convolutional calculation is written
Module;Pixel filling logic, for the judging result signal according to edge detection logic to the 3*3 convolution window control logic
The pixel data of displacement output is symmetrically filled, so that the figure centered on boundary pixel point that 3*3 convolution window is confined
As matrix and convolution Nuclear Data completion planar convolution, and export to the filling input terminal for filling selector.
Further, the main control module includes host state machine, and the working condition of the host state machine includes the first row
Write state, the second row write state, the first row convolution state, the third line write state, the second row convolution state, the first row for the first time for the first time
Write state, the third line convolution state and the second row write state;Host state machine, under the first row for the first time write state, by the
A line read-write enable signal control ahb bus happens suddenly reads the pixel data of the first preset row for the first time, and is written described the
In one line buffer;Under the second row for the first time write state, enable signal is read and write by the second row controls ahb bus and happen suddenly for the first time reading
The pixel data of the described second preset row is taken, and is written in second line buffer;Under the first row convolution state, pass through
The pixel data in first line buffer is read out in the control of a line convolution enable signal, and will according to column address enable signal
The pixel data displacement read out, which is written in the convolutional calculation module, carries out convolution algorithm;Under the third line write state, pass through
The third line reads and writes enable signal control ahb bus burst and reads the pixel data of the preset row of third, and the third is written
In line buffer;Under the second row convolution state, second row buffering is read out by the control of the second row convolution enable signal
Pixel data in device, and the pixel data read out displacement is written by the convolutional calculation module according to column address enable signal
Middle carry out convolution algorithm;Under the first row write state, enable signal is read and write by the first row and controls ahb bus burst reading update
The pixel data of the described first preset row afterwards, and be written in first line buffer;Under the third line convolution state, pass through
The pixel data in the third line buffer is read out in the control of the third line convolution enable signal, and according to column address enable signal
The pixel data read out displacement is written in the convolutional calculation module and carries out convolution algorithm;Under the second row write state, lead to
It crosses the read-write enable signal control ahb bus burst of the second row and reads the pixel data of the updated second preset row, and write
Enter in second line buffer;Host state machine further includes ring counter, for generating the work of the corresponding host state machine
The status signal of state;Wherein, under the control of the status signal, current line pixel data is in the convolutional calculation mould
After block carries out convolution algorithm, next line pixel data is written column-wise as the convolutional calculation module, and the pixel storage array
In row pixel data to be processed continue to be happened suddenly to read by ahb bus to enter the empty line buffer so that the first row
The pixel data read under write state, the second row write state and described these three working conditions of the third line write state is constantly more
Newly, until the pixel data of all rows of input picture in the pixel storage array is all written in the convolutional calculation module
Complete convolution algorithm.
Further, main control module further includes convolution algorithm Read-write Catrol state machine, for being in institute in host state machine
State progress state conversion under the first row convolution state, the second row convolution state or the third line convolution state;Convolution fortune
The working condition for calculating Read-write Catrol state machine includes: that read states, reading row buffering state, displacement for the first time write fifo status, write FIFO
Wait state, write bus state and write bus wait state;Convolution algorithm Read-write Catrol state machine, under read states for the first time
The pixel data selection for reading first row in first line buffer enters the 3*3 convolution window;Reading row buffering state
Described in the lower pixel data selection read in the row buffering module other than the first row of first line buffer enters
3*3 convolution window;In the case where fifo status is write in displacement, according to the count value of the shift counter of generation to the 3*3 convolution window
Middle pixel data shifts by column to be transferred to the convolutional calculation module and makees convolution algorithm, and the counting of the read counter according to generation
Value is transferred to the convolutional calculation module to 3*3 convolution window line feed displacement and makees convolution algorithm;Writing FIFO wait state
Under, the convolution results fifo module is written into the calculated result of the convolutional calculation module, until the convolution results FIFO
The storage depth of module is greater than or equal to the burst that the ahb bus is configured and writes data length;Under write bus state, according to
On the convolution results write-in ahb bus that the count value for writing counter generated stores the convolution results fifo module, until
The result that the pixel data of all row and columns of input picture in the pixel storage array participates in convolutional calculation was all written
The convolution results fifo module;Under write bus wait state, according to the count value for writing counter or read counter, institute is determined
The result write-in AHB for stating the pixel data participation convolutional calculation of all row and columns of the input picture in pixel storage array is total
Line;Wherein, the convolutional calculation is written in the pixel data that the shift counter count value is stored as the 3*3 convolution window
The row serial number of module;The pixel data that the count value of the read counter is stored as the 3*3 convolution window is in parallel shift
When the column serial number that generates.
Further, the change in count value of the shift counter and the read counter is defeated as the column address signal
Out to the row buffering module.
Further, main control module further includes AHB interface state of a control machine, for driving the host state machine and described
Convolution algorithm Read-write Catrol state machine reads and writes the data on ahb bus, and is determined by each working condition in the host state machine
Its state transition condition is determined to realize the burst transfer of the data on ahb bus.
The technical solution of the utility model is figure of the slide window implementation based on 3*3 for 2*2 to 1024*1024 size
The fast convolution of picture is handled, and compared with the prior art, the pixel data that entire image is completed by expending a small amount of hardware resource is rolled up
Product calculates, and processing image is complete, and image display effect will not be influenced by image boundary;The reading of image data shifts convolution
Treatment process is handled and write out, the bandwidth and computing resource consumption of CPU is saved, reduces time-consuming.
Detailed description of the invention
Fig. 1 is a kind of overall structure block diagram of the accelerating circuit of 3*3 convolution algorithm provided by the embodiment of the utility model;
Fig. 2 is the internal structure frame of convolutional calculation module provided by the embodiment of the utility model and displacement selection control module
Figure;
Fig. 3 is in the utility model embodiment to the symmetrical filling schematic diagram of the edge pixel of input picture;
Fig. 4 is the working condition transition diagram of host state machine provided by the embodiment of the utility model;
Fig. 5 is that the working condition of convolution algorithm Read-write Catrol state machine provided by the embodiment of the utility model converts signal
Figure;
Fig. 6 provides the schematic diagram that convolution kernel data window slides in the input image for the utility model embodiment;
Fig. 7 is the state transition diagram of AHB interface state of a control machine provided by the embodiment of the utility model.
Specific embodiment
Specific embodiment of the present utility model is described further with reference to the accompanying drawing:
The utility model design is: the window of 3*3 convolution kernel passes through 3 single port 1KB in traversal input source image process
Row buffer sram cache stores the pixels of the input source images stored in DDR or SRAM, and by state machine and column
Address counter value realizes that the central pixel point of this window is slided in input source images to complete matrix convolution operation, then will
16 layers 8 FIFO are written in convolution algorithm result, and the data in FIFO are finally write back DDR using ahb bus and are sent to CPU
Interrupt instruction, to realize hardware-accelerated convolutional calculation.
Conceived based on above-mentioned utility model, the utility model embodiment provides a kind of accelerating circuit of 3*3 convolution algorithm, such as
Shown in Fig. 1, which includes, for storing the storage medium of input picture and image convolution operation result, for buffering
The convolution results FIFO module of convolution algorithm result and the ahb bus read and write for controlling DDR module, wherein the storage is situated between
Matter includes on piece SRAM and the outer DDR of piece.The storage medium described in the utility model embodiment uses DDR, as shown in Figure 1
It include base address and the pixel storage array of memory space of configuration input picture, and configuration image convolution fortune in DDR module
Calculate the convolution results storage array of the destination address of result, wherein the pixel size range of input picture is 2*2 to 1024*
1024, and be stored in DDR module with a matrix type.The convolution results fifo module is in the utility model embodiment
The FIFO that depth is 16, bit wide is 8.The ahb bus is in the driving of AHB interface state of a control machine and the control for the read states machine that happens suddenly
The pixel data of burst-length in input picture is read from burst in the pixel storage array under production use, in AHB interface control
The convolution of burst-length in the convolution results fifo module under the driving and burst read states machine control action of state machine processed
As a result it is written in the convolution results storage array, the concrete operations of above-mentioned state machine all use the routine under AMB AHB agreement
Technological means, and burst16, burst8, burst4 and burst2 is supported to transmit, it is repeated no more in the utility model embodiment.
As shown in Figure 1, the accelerating circuit further includes host state machine control module, displacement selection control module, row buffering
Module and convolutional calculation module;Wherein, main control module is DMA master module, electrification reset in the utility model implementation
Afterwards, the row of the pixel of DDR module input picture described in software initialization to pixel quantity row_size and is arranged to pixel quantity
The base address INADDR of correspondence image data matrix, the convolution results store battle array in col_size, the pixel storage array
Output the base address OUTADDR, 3*3 convolution Nuclear Data and normalized coefficient in convolutional calculation module of column, then master control
Molding block receives interruption enable signal and starts actively to issue the visit order to storage medium, main control module adoption status machine control
Mode processed generates corresponding control letter to the DDR module, displacement selection control module, row buffering module and convolutional calculation module
Number;Specifically, in the utility model embodiment, since the design logic for being related to convolution algorithm is complicated, so passing through design shape
State machine realizes operation and Read-write Catrol process, and main control module includes host state machine, supports convolution results burst write operations
Convolution algorithm Read-write Catrol state machine and AHB interface state of a control machine are converted using the state of above-mentioned state machine, are participated in CPU
The convolutional calculation that image is completed in the case where spending very little, is greatly saved the bandwidth of CPU, significantly reduces software time-consuming, mentions
High software efficiency.
As the utility model embodiment, the main control module utilizes the AHB under the control of burst read states machine
The pixel data of the current adjacent rows of input picture in the pixel storage array is read in bus burst, and is packed into row buffering mould
Two line buffers in block, wherein every number of pixel per line in the storage depth and input picture of length, row buffering module is read in burst
Match according to length, is conducive to the utilization efficiency of raising system;The main control module is current according to displacement selection control module
The pixel data read corresponds to the column serial number of the input picture, and pixel data is parallel in control displacement selection control module
Displacement is so that the pixel data and matched convolution kernel alignment of data of write-in convolutional calculation module are every time to accelerate calculating process.Its
In, it is described it is each write-in convolutional calculation module pixel data and matched convolution kernel alignment of data be for realize convolution algorithm,
Convolution window where convolution Nuclear Data slided on the image array corresponding to the pixel data of write-in convolutional calculation module
Cheng Zhong forms the overlapping region comprising the convolution window center where convolution Nuclear Data;
The main control module is also used to by controlling the ahb bus burst write operations for convolution results fifo module
In convolution results write back the convolution results storage array;Then it is read by ahb bus and is based on current phase in input picture
The pixel data of the next line of adjacent two rows, and corresponding displacement and convolution algorithm processing are carried out, until input picture all pixels
It is issued after the completion of data processing and interrupts the result for informing the processing of CPU convolution algorithm to reduce software instruction expense;For and issue
The result for informing the processing of CPU convolution algorithm is interrupted to reduce software instruction expense.Wherein, convolution Nuclear Data is the accelerating circuit
The CPU of the periphery data that register stores in preconfigured window in convolutional calculation module.
Input picture described in the utility model embodiment is to store in the matrix form, since the reading of pixel is generally not
Once can all read, but one by one or several pixels be one group, the processing based on template window needs to be arranged
Some shift registers constitute line buffer.The row buffering module is based on the 3* inside the displacement selection control module
What the size of 3 convolution windows was correspondingly arranged includes the line buffer that 3 shift registers are constituted, for according to image horizontal pixel
Data length buffers the pixel data of input picture corresponding line, and according to the column address signal and major state of main control module generation
The pixel data buffered in input picture is exported to displacement and selects control module by the status signal that machine generates, and is connect in the AHB
The shift register can store the pixel of a line image length under each burst read operation under the driving of mouth state of a control machine
Data.It should be noted that the convolutional calculation amount for being related to multiple features is very big, often beyond physical array processing capacity
Range is needed by caching intermediate result, and repeatedly calculating could complete after adding up.Support the mode of this calculating are as follows: meter every time
The cumulative input calculated is derived from caching, while calculated result is also deposited to caching.Caching can permit there are multiple, can according to need
It is read in from specified cumulative caching, and is output to different specified cumulative buffering write-ins.By being cached to the tupe of caching, make
The flexible use for obtaining hardware resource is possibly realized;The mode that the write-in of caching takes the burst to read is obtained from ahb bus is obtained
Data, the reading of caching take the mode write that happens suddenly that buffering is obtained data write-in ahb bus, adaptation data processing rule
Needs, reduce caching dosage.And a cache unit can correspond to multiple circuits, can be improved the parallel of data input
Degree, so that on a large scale, high performance parallel is treated as possibility.
Since convolution algorithm needs convolution Nuclear Data to carry out operation in image data sliding window, it is therefore desirable to logic with shift
Corresponding shifting function is carried out to image data, logic with shift reads current convolution algorithm sequence number.The utility model embodiment
Described in logic with shift be displacement selection control module, select the row for exporting the status signal according to main control module
The pixel data of input picture corresponding line in buffer module, then each column pixel data parallel shift is cached, so that write-in volume
Convolution is completed during accumulating window sliding of the pixel data of computing module in convolutional calculation module;Wherein, the displacement choosing
Control module is selected after the pixel data that do not go together that the row buffering modular concurrent reads in input picture, is selected by the displacement
Window registers buffering built in control module.In addition when determining convolution algorithm, built in the displacement selection control module
In window storing data determine to convolution pixel data two dimensions in two-dimensional surface direction size (the i.e. transverse direction of pixel
And longitudinal direction), according to the size of the two dimensions using it is described displacement selection control module change image data set at sequence it is suitable
Sequence (enters a new line and changes column), and determines image data sequence displacement according to the status signal that the main control module exports
How much, the convolutional calculation module is based on the corresponding arrangement of window size parameter extraction built in the displacement selection control module
The data of register storage and and its corresponding convolution Nuclear Data progress multiplying.
For window operation, a problem existing in the prior art is that boundary part is unable to get processing, causes to export
Image than input picture reduce row and column pixel.Need to handle image limit data in order to solve the above problem, i.e., corresponding input
The first row pixel data of image, last line pixel data, the first row pixel data of other rows and its last column picture
Prime number evidence.The displacement selection control module provided in the utility model embodiment, described in being exported according to main control module
Status signal selects the pixel data of input picture corresponding line in row buffering module, then by each column pixel data parallel shift,
And pixel data is handled in such a way that symmetric data is filled, so that displacement selection control module will input
The convolutional calculation module is written in whole pixel datas displacement of image, to complete the pixel data and volume of whole picture input picture
The convolution algorithm of product Nuclear Data.
Convolutional calculation module, including the convolution Nuclear Data, for the pixel data of selection control module output will to be shifted
Multiplying is carried out with corresponding convolution Nuclear Data, based on adder group by the results added of the multiplying to realize convolution
Operation.In the utility model embodiment, the lines of pixel data data of every a line image are in correlated condition machine in each convolution algorithm
Control under pass through displacement one by one carry out convolutional calculation processing.The convolutional calculation module is based on the 3*3 convolution window size
Parameter, for extracting in the 3*3 convolution window control logic data of the trigger storage of corresponding arrangement and corresponding described
Convolution Nuclear Data completes multiplying in multiplier unit.
The preconfigured 3*3 convolution kernel control logic of CPU is as shown in Fig. 2, 3*3 convolution kernel control in the utility model embodiment
Register in the first row in the corresponding 3*3 window of logic processed from left to right is followed successively by register P32, register P31 and posts
Storage P30, the register in the second row from left to right are followed successively by register P22, register P21 and register P20, in the third line
Register from left to right is followed successively by register P12, register P11 and register P10.It is multiplied and asks in the convolutional calculation module
With logic by the way of assembly line, multiplication is carried out to 9 data parallel simultaneously in a clock by 9 parallel multipliers
Then these multiplication results are passed through the accumulation result based on the whole cumulative items of tree structured adder group acquisition by operation.
As a kind of mode that the utility model is implemented, as shown in Figure 1, the line buffer of the row buffering module includes the
One line buffer, the second line buffer and third line buffer, for selecting end col_addr and state to select by column address
End state is connected in parallel with the main control module respectively, wherein the first line buffer, the second line buffer in row buffering module
The single port SRAM of 1KB size is all preferably used with third line buffer.First line buffer is used in the main control module
Control under buffering ahb bus burst read input picture the first preset row pixel data;Second line buffer is used for
The pixel data of the second preset row of the input picture that ahb bus burst is read is buffered under the control of the main control module;
Third of the third line buffer for buffering the input picture that ahb bus burst is read under the control of the main control module is pre-
Set capable pixel data.
Specifically, the described first preset row, the second preset row and the preset row of the third are ahb bus from the picture
Happen suddenly three row serial numbers adjacent to each other in the input picture of reading in plain storage array, and picture corresponding to these three row serial numbers
Prime number is updated according to after being read by ahb bus burst according to matrix convolution operation law.It should be understood that for involved in a convolution
The data of different images row are input to 3 rows to the data parallel that can ensure to gradually output by the delay disposal of row delay
Buffered unit can also be realized by synchronously different data pointers.By this processing, so that identical data
It is multiplexed simultaneously by all processing units, improves data-reusing rate, simplify the control circuit design for reducing power consumption.
As a kind of mode that the utility model is implemented, as shown in Figure 1, in order to which the entire image that will input is in the convolution
Complete process of convolution in computing module, setting first selector S1 in the displacement selection control module, second selector S2, the
Three selector S3 and 3*3 convolution window control logics;Wherein: first selector S1, second selector S2 and third selector S3
All there are three input terminals, these three input terminals are respectively first input end 0, the second input terminal 1, third input terminal 2, first choice
The first input end 0 of device S1 is connected with the output end of the third line buffer, the second input terminal 1 of first selector S1 with
The output end of first line buffer is connected, the third input terminal 2 of first selector S1 and second line buffer
Output end is connected;The first input end 0 of second selector S2 is connected with the output end of first line buffer, the second choosing
The second input terminal 1 for selecting device S2 is connected with the output end of second line buffer, the third input terminal 2 of second selector S2
It is connected with the output end of the third line buffer;The first input end 0 of third selector S3 and second line buffer
Output end be connected, the second input terminal 1 of third selector S3 is connected with the output end of the third line buffer, third
The third input terminal 2 of selector S3 is connected with the output end of first line buffer.
3*3 convolution window control logic, including by the first shift register, the second shift register and third shift LD
The 3*3 convolution window that device is constituted, wherein the first shift register, the second shift register and third shift register are all by three
A register-combinatorial is constituted;As shown in Figure 1, corresponding first shift register of the first row in 3*3 convolution window, by register
L32, register L31, register L30 are constituted;Second row corresponds to the second shift register, by register L22, register L21, posts
Storage L20 is constituted;The third line corresponds to third shift register, is made of register L12, register L11, register L10.
As shown in Figure 1, the input terminal (input terminal of corresponding register L32) and first selector S1 of the first shift register
Output end row0 be connected, for buffer first selector S1 selection enter 3*3 convolution window the first row pixel data;The
The input terminal (input terminal of corresponding register L22) of two shift registers is connected with the output end row1 of second selector S2, uses
Enter the pixel data of the second row of 3*3 convolution window in buffering second selector S2 selection;The input terminal of third shift register
It is connected with the output end row2 of third selector S3, enters 3*3 convolution window the third line for buffering third selector S3 selection
Pixel data;Wherein, an address input end phase of an address input end of first selector S1 and second selector S2
Connection, another address input end of second selector S2 are connected with an address input end of third selector S3, third
Another address input end of selector S3 is connect with the main control module, for receiving the status signal.
The pixel data of the displacement selection control module output carries out the tool of multiplying with corresponding convolution Nuclear Data
Body: 3*3 convolution window sliding process of the corresponding 3*3 window of 3*3 convolution kernel control logic in storage input image pixels data
In, to complete the image array of input picture and the matrix convolution operation of convolution nuclear matrix, 3*3 convolution kernel control logic is corresponding
The center of 3*3 window and the center of 3*3 convolution window all must be in windows overlay regions;When 3*3 convolution kernel control logic pair
The relative position of the 3*3 window and 3*3 convolution window answered is as shown in fig. 6, the corresponding 3*3 window of 3*3 convolution kernel control logic exists
When being slided by from left to right on input picture, the centre data P21 and 3*3 of the corresponding 3*3 window of 3*3 convolution kernel control logic
The central registry L21 of convolution window is in windows overlay region, and corresponding first choosing of pixel data in windows overlay region
Select the output end row1 of the output end row0 and second selector S2 of device S1.The corresponding 3*3 window of subsequent 3*3 convolution kernel control logic
Mouth is turned right in sliding process, and the image pixel data that the displacement selection control module controls the row buffering module input carries out
Rank transformation, windows overlay region can change, and the central registry L21 of centre data P21 and 3*3 convolution window can be participated in
Convolution algorithm, data immobilize in 3*3 convolution kernel control logic;The corresponding 3*3 window of 3*3 convolution kernel control logic is completed past
After right sliding, slide downward a line is further continued for sliding in the transverse direction of input picture, centre data P21 and 3*3 convolution window
The central registry L21 of mouth, which can be confined, participates in convolution algorithm in windows overlay region;Wherein, implementation as shown in FIG. 6
The calculated result of convolutional calculation module described in example:
Y (0,0)=P32*0+P31*0+P30*0+P22*0+P21*L32+P20*L31+P12*0+P11*L22+P1 0*L21
=P21*L32+P20*L31+P11*L22+P10*L21.
As the utility model embodiment, as shown in Figure 2: the displacement selection control module further includes that edge filling is patrolled
Volume, including filling selector, edge detection logic and pixel filling logic;Edge detection logic, respectively with pixel filling logic
It is connected with filling selector, for being obtained by 3*3 convolution window at a pixel to be detected with the pixel to be detected
Centered on point, the pixel value of 8 points of surrounding is compared further according to the calculated result and a preset threshold value of Sobel Operator structure
Compared with to realize the judgement to the pixel to be detected in input picture address, when the value after calculating is greater than this threshold value, this is determined
Pixel is edge.Then judging result signal is exported to pixel filling logic to control pixel filling operation, while will judgement
Consequential signal is exported to the selection end of filling selector, to realize the control of the pixel data to write-in convolutional calculation module;
Selector S10 is filled, including fills input terminal and is not filled with input terminal, is used for according to edge detection logic to be checked
The mode for surveying the judging result selection pixel data write-in convolutional calculation module of pixel, when the 3*3 convolution window control is patrolled
The pixel data for collecting displacement output is determined as edge pixel by edge detection logic, then controls corresponding pixel data and pass through pixel
Pass through the filling input terminal write-in convolutional calculation module for filling selector S10 after filling logical process, otherwise controls described 3*3 volumes
Convolution meter is written in the input terminal that is not filled with that the pixel data of product window control logical shift output directly passes through filling selector S10
Calculate module.
Pixel filling logic, for treating the judging result of detection pixel point according to edge detection logic to described 3*3 volumes
The pixel data of product window control logical shift output is symmetrically filled, and specifically, edge detection logic is first arranged and is determined
Input picture boundary pixel point as symmetrical centre, by the pixel on the inside of the input picture where boundary pixel point about
The boundary pixel point is symmetrically filled into the outside of the input picture where the boundary pixel point, so that 3*3 convolution
The image array and convolution Nuclear Data completion planar convolution centered on boundary pixel point that window is confined.Then it exports
To the filling input terminal of filling selector.Judgement reads the pixel data into 3*3 convolution window control logic in input picture
Whether address is located at the boundary of input picture, and the pixel data at the edge in input picture is filled and is exported, and right
Pixel data in the edge for being not in input picture then directly exports, so that the pixel data of the fringe region of input picture is joined
With the window size being not only restricted to when convolution algorithm built in 3*3 convolution window control logic.
For the filling mode of edge pixel, one embodiment is proposed, as shown in figure 3, when the edge detection logic is examined
When measuring the pixel a32 of the first row first row head of input picture and being in image boundary position, pixel filling logic is by pixel
Point a32 is set as symmetrical centre, by pixel a31, a22 and a21 on the inside of the input picture where pixel a32 about this
Pixel a32 carries out the outside that central symmetry is accordingly filled into input picture, i.e., the pixel a21 filling on the inside of input picture
To the upper left side of pixel a32, the pixel a22 on the inside of input picture is filled into the surface of pixel a32, in input picture
The pixel a31 of side is filled into the left side of pixel a31;Correspondingly, when the edge detection logic detects input picture
When the pixel a31 of the first row secondary series head, the pixel a21 on the inside of input picture is filled into the surface of pixel a31;
It, will be on the inside of input picture when the edge detection logic detects the pixel a22 of the second row first row head of input picture
Pixel a21 be filled into the left side of pixel a22.
As the utility model embodiment, the main control module includes host state machine, the work shape of the host state machine
State includes original state IDLE/0, the first row write state 1, the second row write state 2, the first row convolution state 6, for the first time for the first time
Three row write states 5, the second row convolution state 7, the first row write state 3, the third line convolution state 8, the second row write state 4 and convolution
End state 9.Under the driving of AHB interface state of a control machine, specific state transition operation such as Fig. 4 institute of the host state machine
Show, when control enabling signal start sets high level, the first row write state 1 for the first time is transformed by original state IDLE/0.
In the first row for the first time write state 1, dashed forward for the first time when the first row reads and writes [0]=0 enable signal w_r by ahb bus
Hair reads the pixel data of the first preset row, and is written in first line buffer;When the first row reads and writes enable signal
When [0]=1 w_r, indicate that the pixel data of the first preset row is completely written in first line buffer, and enter second
Otherwise row write state 2 for the first time are continually maintained in the first row for the first time in write state 1.
Under the second row for the first time write state 2, when the second row reads and writes [1]=0 enable signal w_r, control ahb bus is dashed forward for the first time
Hair reads the pixel data of the second preset row, and is written in second line buffer;When the second row reads and writes enable signal
When [1]=1 w_r, indicate that the pixel data of the second preset row is completely written in second line buffer, and enter first
Otherwise row convolution state 6 is continually maintained in the second row for the first time in write state 2.
Under the first row convolution state 6, when the first row convolution enable signal [0]=0 c_r, it is slow to read out the first row
The pixel data in device is rushed, and the pixel data read out displacement is written by the convolution meter according to column address enable signal col
It calculates in module and carries out convolution algorithm;As column address enable signal col=0 and the first row convolution enable signal [0]=1 c_r, institute
The pixel data stated in the first line buffer completes convolution algorithm but displacement selection control mould in the convolutional calculation module
The shifting function for the pixel data that block reads ahb bus burst is simultaneously not finished, into the third line write state 5;When column address makes
Can signal col=1 and when the first row convolution enable signal [0]=1 c_r, the pixel data in first line buffer is in institute
The pixel data that convolutional calculation module completes convolution algorithm and the displacement selection control module reads ahb bus burst is stated to move
Bit manipulation terminates, and terminates state 9 into convolution;Otherwise it is continually maintained in the first row convolution state 6.Wherein, ahb bus happens suddenly
The pixel data of reading corresponds to the pixel data of the described first preset row and the pixel data of the second preset row.
Under the third line write state 5, the third line reads and writes enable signal w_r [1]=0 or column address enable signal col=0
When, the pixel data of the preset row of third is read in ahb bus burst, and is written in the third line buffer;Work as column address
When enable signal col=1 or the third line read and write [1]=1 enable signal w_r, indicate that the pixel data of the preset row of the third is complete
It is written in the third line buffer entirely, while the pixel data in second line buffer starts that the convolutional calculation is written
Module carries out convolution algorithm, and enters the second row convolution state 7, is otherwise continually maintained in the second row for the first time in write state 2.
Under the second row convolution state 7, it is slow that second row is read out in the second row convolution enable signal c_r [1]=0 control
The pixel data in device is rushed, and the pixel data read out displacement is written by the convolution meter according to column address enable signal col
It calculates in module and carries out convolution algorithm;As column address enable signal col=0 and the second row convolution enable signal [1]=1 c_r, institute
The pixel data stated in the second line buffer completes convolution algorithm but displacement selection control mould in the convolutional calculation module
The shifting function for the pixel data that block reads ahb bus burst is simultaneously not finished, into the first row write state 3;When column address makes
Can signal col=1 and when the first row convolution enable signal [1]=1 c_r, the pixel data in first line buffer is in institute
The pixel data that convolutional calculation module completes convolution algorithm and the displacement selection control module reads ahb bus burst is stated to move
Bit manipulation terminates, and terminates state 9 into convolution;Otherwise it is continually maintained in the second row convolution state 7.
Under the first row write state 3, when the first row reads and writes [0]=0 enable signal w_r, control ahb bus burst is read more
The pixel data of the described first preset row after new, and being written in first line buffer, at this time first line buffer
Interior data are read away in the first row convolution state 6;When column address enable signal col=1 or the first row read-write make
When energy signal [0]=1 w_r, indicate that the pixel data of the described first preset row updated is completely written to first line buffer
In, while the pixel data in the third line buffer starts that the convolutional calculation module progress convolution algorithm is written, and goes forward side by side
Enter the third line convolution state 8, is otherwise continually maintained in the first row write state 3.
Under the third line convolution state 8, it is slow that described the third line is read out in the third line convolution enable signal c_r [2]=0 control
The pixel data in device is rushed, and the pixel data read out displacement is written by the convolution meter according to column address enable signal col
It calculates in module and carries out convolution algorithm;As column address enable signal col=0 and the third line convolution enable signal [2]=1 c_r, institute
The pixel data stated in third line buffer completes convolution algorithm but displacement selection control mould in the convolutional calculation module
The shifting function for the pixel data that block reads ahb bus burst is simultaneously not finished, into the second row write state 4;When column address makes
Can signal col=1 and when the third line convolution enable signal [2]=1 c_r, the pixel data in the third line buffer is in institute
The pixel data that convolutional calculation module completes convolution algorithm and the displacement selection control module reads ahb bus burst is stated to move
Bit manipulation terminates, and terminates state 9 into convolution;Otherwise it is continually maintained in the third line convolution state 8.
Under the second row write state 4, when the second row reads and writes [1]=0 enable signal w_r, control ahb bus burst is read more
The pixel data of the described second preset row after new, and be written in second line buffer;Second line buffer at this time
Interior data are read away in the second row convolution state 7;When column address enable signal col=1 or the read-write of the second row make
When energy signal [1]=1 w_r, indicate that the pixel data of the described second preset row updated is completely written to second line buffer
In, while the pixel data in first line buffer starts that the convolutional calculation module progress convolution algorithm is written, and goes forward side by side
Enter the first row convolution state 6, is otherwise continually maintained in the second row write state 4.
Convolution terminates in state 9, and the data of the convolution results FIFO are write back institute by control ahb bus by host state machine
State convolution results storage array;When AHB interface ready signal hready sets high level, state transition returns original state IDLE/
0.Then continue the new one-row pixels data that the input picture is read in burst from the pixel storage array, continue duplicate
The processing pixel data that above-mentioned state conversion process newly inputs.
Specifically, host state machine further includes ring counter, for generating the working condition of the corresponding host state machine
The status signal;Under the control of the status signal and the column address enable signal, current line pixel data is described
After convolutional calculation module carries out convolution algorithm, next line pixel data is written column-wise as the convolution according to the column address signal
Computing module, and row pixel data to be processed in the pixel storage array continues to be happened suddenly by ahb bus described in reading entrance
Empty line buffer in row buffering module, so that host state machine is in the first row write state 3,4 and of the second row write state
It is recycled under described these three working conditions of the third line write state 5, traversal reads the input figure from the pixel storage array
As the pixel data that do not go together, until institute is all written in the pixel data of all rows of input picture in the pixel storage array
It states in convolutional calculation module and completes convolution algorithm.Wherein the count value of ring counter respectively correspond the first row write state 3,
The second row write state 4 and the third line write state 5, that is, generate the status signal.
Main control module further includes convolution algorithm Read-write Catrol state machine, for being in the first row volume in host state machine
The lower carry out state conversion of cumuliformis state 6, the second row convolution state 7 or the third line convolution state 8, to realize the 3*
3 convolution window control logics select the output end row0 of first selector S1, the output end row1 of second selector S2 and third
The data parallel for selecting the output end row2 of device S3 is read, and shifts convolutional calculation, and convolutional calculation result is written to the convolution
As a result in FIFO, then data are read from the convolution results FIFO under the driving of the AHB interface state of a control machine
Come and is written out on AHB interface.Wherein, convolutional calculation was completed within 3 clock cycle, so that the operation of main control module
Timing is abundant.
As the utility model embodiment, the working condition of the convolution algorithm Read-write Catrol state machine includes: initial shape
State IDLE, for the first time read states RD_ROWO, row buffering state RD_BUF is read, is shifted and is write fifo status SHFT, writes FIFO waiting shape
State SHFT_WAIT, write bus state BWR, write bus wait state BWR_WAIT and write complete state BWR_END;Convolution algorithm
The concrete operations of Read-write Catrol state machine are as shown in Figure 5:
When the host state machine is in the first row convolution state 6, the second row convolution state 7 or described the third line
Start to control convolution algorithm when convolution state 8, the convolution algorithm Read-write Catrol state machine jumps to head by original state IDLE
Secondary read states RD_ROWO reads the pixel data of first row in first line buffer at read states RD_ROWO for the first time,
Enter the 3*3 convolution window by displacement selection control module selection, displacement selects control module during this
Interior shift register starts to execute to pixel data changes column operation, then state transition to reading row buffering state RD_BUF.
In the case where reading row buffering state RD_BUF, the displacement selection control module is read in the row buffering module in addition to institute
It states the pixel data outside the first row of the first line buffer and is displaced into the 3*3 convolution window by column, for the first time read states RD_
The corresponding array of registers serial number of the data being read and stored in the 3*3 convolution window under ROWO changes, until described
Displacement selection control module has read current line image pixel data in the row buffering module, then the displacement selection control
Module executes line feed operation, reads the next line pixel data of input picture, state transition to displacement writes fifo status SHFT.
Since convolution algorithm needs convolution Nuclear Data to carry out operation in image data sliding window, it is therefore desirable to logic with shift
Corresponding shifting function is carried out to image data, logic with shift reads current convolution algorithm sequence number.The utility model embodiment
It is middle that the pixel data that the status signal selects input picture corresponding line in the row buffering module is exported according to main control module,
Each column pixel data parallel shift is cached again, so that the pixel data of write-in convolutional calculation module is in convolutional calculation module
Window sliding during complete convolution;When determining convolution algorithm, deposited in the window built in the displacement selection control module
Store up data determine to convolution pixel data two dimensions in two-dimensional surface direction size (i.e. pixel is horizontal and vertical),
According to the size of the two dimensions using it is described displacement selection control module change image data set at sequence order (enter a new line
With change column), and according to the main control module export the status signal determine image data sequence displacement number so that
Enter to the pixel data and convolution kernel alignment of data of a line image of each convolution algorithm of the convolutional calculation module.
It is right when the count value of the shift counter wa_cout of generation is not equal to 3 in the case where fifo status SHFT is write in displacement
Pixel data shifts by column in the 3*3 convolution window is transferred to the convolutional calculation module, until that will correspond to input picture
The convolutional calculation module, which is written, in one-row pixels data just can be carried out convolution algorithm;When the counting of the read counter r_cout of generation
Count value of the value not equal to 0, shift counter wa_cout is equal to the data depth D_F of 3 and the write-in convolution results FIFO
When writing length B_L less than burst, the convolutional calculation module is completed a convolution algorithm but writes from the convolutional calculation module
The number for entering the convolution results of the convolution results FIFO is less than burst and writes length B_L, while the displacement of the 3*3 convolution window
The current not stored complete a line image pixel data of register, then return and read row buffering state RD_BUF for the image pixel number of reading
According to continuing to be displaced by column in the 3*3 convolution window, primary line feed operation then is executed to the input picture;When reading is counted
When count value of the count value equal to 0 and shift counter wa_cout of number device r_cout is equal to 3, the 3*3 convolution window
Shift register has traversed all pixels data in the input picture, and the volume is written in corresponding input image pixels data entirely
Product computing module carries out convolution algorithm, then state transition is to writing FIFO wait state SHFT_WAIT;As the read counter r_ of generation
The count value that the count value of cout is equal to 0, shift counter wa_cout is equal to the data of 3 and the write-in convolution results FIFO
When depth D_F writes length B_L more than or equal to burst, the convolutional calculation module is written entirely and carries out for the pixel data of input picture
Convolution algorithm, while being greater than or equal to from the number that the convolution results of the convolution results FIFO are written in the convolutional calculation module
Length B_L is write in burst, then state transition to write bus state BWR.Displacement is write under fifo status SHFT to the pixel inputted parallel
Data carry out convolution algorithm, and enter a new line after completing a convolution algorithm to the input picture, then carry out next secondary volume
Product operation, until the count value of shift counter wa_cout reaches 3.
In the case where writing FIFO wait state SHFT_WAIT, the count value of shift counter wa_cout is not equal to 3, no longer gives institute
It states convolution results FIFO module and new convolution results is provided;When all calculated result current in the convolutional calculation module all
The convolution results fifo module is written, and the storage depth D_F of the convolution results fifo module is greater than or equal to the AHB
When data length B_L is write in the burst that bus is configured, state transition to write bus state BWR.
At write bus state BWR, when flag bit B_W=1, the count value for writing counter wr_cout of generation are write in burst
Non-zero, and the count value of read counter r_cout be equal to 0 when, the convolution results FIFO is completed a burst and writes length
The operation of ahb bus is written in data, but according in the non-zero count for the writing counter wr_cout convolutional calculation module
Calculated result be not completely written to the convolution results fifo module, then state transition is to writing FIFO wait state SHFT_WAIT;
When flag bit B_W=1, the count value non-zero for writing counter wr_cout of generation, and the counting of read counter r_cout are write in burst
When being worth non-zero 0, input image pixels data are not written the convolutional calculation module entirely and carry out convolution algorithm, then state transition to reading
Row buffering state RD_BUF;As the count value wr_cout for writing counter wr_cout that burst is write flag bit B_W=1 and generated
When=0, the convolution results fifo module be completed one happen suddenly write length data be written ahb bus on the basis of, will
Remaining convolution results are writen to ahb bus, state transition to write bus wait state in the convolution results fifo module
BWR_WAIT;Wherein each described convolution results corresponds to pixel data currently stored in the 3*3 convolution window;When described
Convolution results FIFO rests on write bus shape when not completing the operation for the data write-in ahb bus that length is write in a complete burst
Under state BWR.
At write bus wait state BWR_WAIT, when write counter wr_cout=0, read counter r_cout=0 or
When AHB interface ready signal hready=0, ahb bus is all written in the convolution results in the convolution results fifo module,
State transition writes complete state BWR_END, otherwise rests on write bus wait state BWR_WAIT.When AHB interface ready signal
When hready=1, the state BWR_END of writing complete jumps back to original state IDLE, and the pixel data of the input picture is total in AHB
The operation of writing out of line is completed.
Specifically, the pixel data that the count value of the shift counter wa_cout is stored as the 3*3 convolution window
The line feed offset of the convolutional calculation module is written;The count value of the read counter r_cout, as the 3*3 convolution window
The line skew amount that the pixel data of mouth storage is generated in parallel shift.The shift counter wa_cout and the reading count
The change in count value of device r_cout is exported as the column address signal to the row buffering module so that the convolution algorithm is read
State of a control machine is write to control the displacement selection control module completion to the line feed of the pixel data of input picture and change column read-write.
AHB interface state of a control machine, including original state, discontinuous transmission state and state is continuously transmitted, for driving
Data on the host state machine and convolution algorithm Read-write Catrol state machine read-write ahb bus, reflect the master control molding
Block controls the state of the burst read-write of ahb bus, and determines its state transition by each working condition in the host state machine
Condition is to realize the burst transfers of the data on ahb bus.The AHB Interface Controller state machine is skipped to by the original state
The discontinuous transmission state is indicated in the wherein one-row pixels data for currently carrying out the input picture under the burst read operation
In first data transmission, the main control module by ahb bus happen suddenly from the pixel storage array reading input
The pixel data of the current adjacent rows of image is to realize the convolution algorithm in the 3*3 convolution window sliding process;The AHB connects
Mouthful state of a control machine by the discontinuous transmission state skip to it is described continuously transmit in state procedure, the main control module passes through
By the pixel data of current adjacent rows, the convolution results storage battle array is written in the operation result in convolutional calculation module to ahb bus
Column;The AHB interface state of a control machine continuously transmits the master control molding during state skips to the original state by described
The burst reading of the ahb bus of block control and burst write operations terminate, and then AHB interface ready signal hready sets high level, institute
It states main control module and starts new burst transfer.
The common knowledge for partly belonging to those skilled in the art is not described in detail in the utility model specification.And with
Upper described device embodiments are only schematical, wherein the unit as illustrated by the separation member can be or
It may not be and be physically separated, component shown as a unit may or may not be physical unit, it can
It is in one place, or may be distributed over multiple network units.Part therein can be selected according to the actual needs
Or whole modules realize the purpose of present embodiment scheme.Those of ordinary skill in the art are not making the creative labor
In the case of, it can it understands and implements.
Claims (8)
1. a kind of accelerating circuit of 3*3 convolution algorithm, which includes, for storing input picture and image convolution operation
As a result DDR module and the convolution results fifo module for buffering convolution algorithm result, wherein DDR module includes that configuration is defeated
Enter the base address of image and the pixel storage array of memory space, and configures the volume of the destination address of image convolution algorithm result
Product result storage array;It is characterized in that, the accelerating circuit includes main control module, displacement selection control module, row buffering mould
Block and convolutional calculation module;
Main control module reads input picture current adjacent two for happening suddenly from the pixel storage array by ahb bus
Capable pixel data, and the parallel shift for controlling pixel data in displacement selection control module makes that convolutional calculation mould is written every time
The pixel data of block and matched convolution kernel alignment of data are to accelerate calculating process, then pass through ahb bus for current adjacent rows
Pixel data in convolutional calculation module operation result the convolution results storage array is written;Main control module is also used to
The pixel data of the next line based on current adjacent rows in input picture is read by ahb bus, and is shifted accordingly
With convolution algorithm processing, issues to interrupt after the completion of the data processing of input picture all pixels and inform the processing of CPU convolution algorithm
Result to reduce software instruction expense;Wherein, convolution Nuclear Data is CPU preconfigured convolution window in convolutional calculation module
The data stored in mouthful;The pixel data and matched convolution kernel alignment of data of each write-in convolutional calculation module are to be
Realize convolution algorithm, the pixel data institute in the convolutional calculation module being written of the convolution window where control convolution Nuclear Data is right
On the image array answered in sliding process, the overlapping region comprising the convolution window center where convolution Nuclear Data is formed;
Row buffering module, including the line buffer that shift register is constituted, for being buffered according to image horizontal pixel data length
The pixel data of input picture corresponding line, and the column address signal and status signal that are generated according to main control module input buffering
Pixel data in image exports to displacement and selects control module;
Displacement selection control module, the status signal for being exported according to main control module select to input in row buffering module
The pixel data of image corresponding line, then it is filled processing by each column pixel data parallel shift, and to pixel data, so that
Convolution algorithm is completed in all pixels data write-in convolutional calculation module in input picture;
Convolutional calculation module, for multiplying the pixel data of displacement selection control module output with corresponding convolution Nuclear Data
Method operation, based on adder group by the results added of the multiplying to realize convolution algorithm.
2. accelerating circuit according to claim 1, which is characterized in that the line buffer of the row buffering module includes the first row
Buffer, the second line buffer and third line buffer, for by column address select end and state selection end respectively with it is described
Main control module is connected in parallel, in which: the first line buffer is used to buffer ahb bus under the control of the main control module prominent
Send out the pixel data of the first preset row of the input picture read;Second line buffer is used for the control in the main control module
The pixel data of second preset row of the input picture that lower buffering ahb bus burst is read;Third line buffer is used for described
The pixel data of the preset row of third for the input picture that ahb bus burst is read is buffered under the control of main control module;Described
One preset row, the second preset row and the preset row of the third are that ahb bus happens suddenly reading from the pixel storage array
Input picture in adjacent to each other three row serial numbers, and pixel data corresponding to these three row serial numbers is happened suddenly by ahb bus
It is updated after reading according to matrix convolution operation law.
3. accelerating circuit according to claim 2, which is characterized in that the displacement selection control module includes first choice
Device, second selector, third selector and 3*3 convolution window control logic;
All there are three input terminals for first selector, second selector and third selector, these three input terminals are respectively first defeated
Enter end, the second input terminal, third input terminal, wherein the first input end of first selector is defeated with the third line buffer
Outlet is connected, and the second input terminal of first selector is connected with the output end of first line buffer, first selector
Third input terminal be connected with the output end of second line buffer;
The first input end of second selector is connected with the output end of first line buffer, and the second of second selector is defeated
Enter end to be connected with the output end of second line buffer, the third input terminal of second selector and the third line buffer
Output end be connected;
The first input end of third selector is connected with the output end of second line buffer, and the second of third selector is defeated
Enter end to be connected with the output end of the third line buffer, the third input terminal of third selector and first line buffer
Output end be connected;
3*3 convolution window control logic, including by the first shift register, the second shift register and third shift register structure
At 3*3 convolution window, wherein the first shift register, the second shift register and third shift register are all posted by three
Storage combination is constituted;The input terminal of first shift register is connected with the output end of first selector, for buffering first choice
Device selection enters the pixel data of 3*3 convolution window the first row;The input terminal of second shift register and second selector it is defeated
Outlet is connected, and the pixel data of the second row of 3*3 convolution window is entered for buffering second selector selection;Third shift register
Input terminal be connected with the output end of third selector, for buffer third selector selection enter 3*3 convolution window the third line
Pixel data;
Wherein, an address input end of first selector is connected with an address input end of second selector, the second choosing
Another address input end for selecting device is connected with an address input end of third selector, another ground of third selector
Location input terminal is connect with the main control module, for receiving the status signal.
4. accelerating circuit according to claim 3, which is characterized in that the displacement selection control module further includes edge filling
Logic, including filling selector, edge detection logic and pixel filling logic;
Edge detection logic is connected, for determining the 3*3 convolution window respectively with pixel filling logic and filling selector
Interior pixel to be detected is exported in the address location of input picture, and by judging result signal to pixel filling logic and filling
The selection end of selector;
Selector is filled, including fills input terminal and is not filled with input terminal, for selecting the received edge detection in end to patrol according to it
Convolutional calculation module is written in the pixel data for the judging result signal selection respective input collected;
Pixel filling logic, for the judging result signal according to edge detection logic to the 3*3 convolution window control logic
Displacement output pixel data symmetrically filled so that the 3*3 convolution window confined centered on boundary pixel point
Image array and the convolution Nuclear Data complete planar convolution, and export to fill selector filling input terminal.
5. accelerating circuit according to claim 2, which is characterized in that the main control module includes host state machine, the master
The working condition of state machine includes the first row write state, the second row write state, the first row convolution state, third row write for the first time for the first time
State, the second row convolution state, the first row write state, the third line convolution state and the second row write state;
Host state machine, for reading and writing enable signal control ahb bus by the first row and dashing forward for the first time under the first row for the first time write state
Hair reads the pixel data of the first preset row, and is written in first line buffer;
Under the second row for the first time write state, enable signal is read and write by the second row controls ahb bus and happen suddenly for the first time reading described the
The pixel data of two preset rows, and be written in second line buffer;
Under the first row convolution state, the picture in first line buffer is read out by the control of the first row convolution enable signal
Prime number evidence, and the pixel data read out displacement is written in the convolutional calculation module according to column address enable signal and is rolled up
Product operation;
Under the third line write state, enable signal is read and write by the third line and controls the ahb bus burst reading preset row of third
Pixel data, and be written in the third line buffer;
Under the second row convolution state, the picture in second line buffer is read out by the control of the second row convolution enable signal
Prime number evidence, and the pixel data read out displacement is written in the convolutional calculation module according to column address enable signal and is rolled up
Product operation;
Under the first row write state, enable signal is read and write by the first row controls ahb bus burst and read updated described the
The pixel data of one preset row, and be written in first line buffer;
Under the third line convolution state, the picture in the third line buffer is read out by the control of the third line convolution enable signal
Prime number evidence, and the pixel data read out displacement is written in the convolutional calculation module according to column address enable signal and is rolled up
Product operation;
Under the second row write state, enable signal is read and write by the second row controls ahb bus burst and read updated described the
The pixel data of two preset rows, and be written in second line buffer;
Host state machine further includes ring counter, the state letter of the working condition for generating the corresponding host state machine
Number;
Wherein, under the control of the status signal, current line pixel data carries out convolution algorithm in the convolutional calculation module
Later, next line pixel data is written column-wise as the convolutional calculation module, and row picture to be processed in the pixel storage array
Prime number is read according to continuing to be happened suddenly by ahb bus into the empty line buffer, so that the first row write state, described second
The pixel data read under row write state and described these three working conditions of the third line write state is constantly updated, until the pixel
The pixel data of all rows of input picture, which is all written in the convolutional calculation module, in storage array completes convolution algorithm.
6. accelerating circuit according to claim 5, which is characterized in that main control module further includes convolution algorithm Read-write Catrol shape
State machine, for being in the first row convolution state, the second row convolution state or the third line convolution in host state machine
State conversion is carried out under state;
The working condition of convolution algorithm Read-write Catrol state machine includes: that FIFO shape is write in read states, reading row buffering state, displacement for the first time
State writes FIFO wait state, write bus state and write bus wait state;
Convolution algorithm Read-write Catrol state machine, for reading the picture of first row in first line buffer under read states for the first time
Prime number enters the 3*3 convolution window according to selection;
The picture in the row buffering module other than the first row of first line buffer is read in the case where reading row buffering state
Prime number enters the 3*3 convolution window according to selection;
In the case where fifo status is write in displacement, according to the count value of the shift counter of generation to pixel number in the 3*3 convolution window
The convolutional calculation module is transferred to according to displacement by column and makees convolution algorithm, and according to the count value of the read counter of generation to described
The line feed displacement of 3*3 convolution window is transferred to the convolutional calculation module and makees convolution algorithm;
In the case where writing FIFO wait state, the convolution results fifo module is written into the calculated result of the convolutional calculation module,
Until the storage depth of the convolution results fifo module is greater than or equal to the burst that is configured of the ahb bus, to write data long
Degree;
Under write bus state, the convolution that is stored the convolution results fifo module according to the count value for writing counter of generation
As a result it is written on ahb bus, until the pixel data of all row and columns of the input picture in the pixel storage array participates in
The convolution results fifo module was all written in the result of convolutional calculation;
Under write bus wait state, according to the count value for writing counter or read counter, determine in the pixel storage array
Input picture all row and columns pixel data participate in convolutional calculation result be written ahb bus;
Wherein, the convolution meter is written in the pixel data that the shift counter count value is stored as the 3*3 convolution window
Calculate the row serial number of module;The count value of the read counter is being moved parallel as the pixel data that the 3*3 convolution window stores
The column serial number generated when position.
7. accelerating circuit according to claim 6, which is characterized in that the counting of the shift counter and the read counter
Changing value is exported as the column address signal to the row buffering module.
8. accelerating circuit according to claim 7, which is characterized in that main control module further includes AHB interface state of a control machine,
For driving the host state machine and the convolution algorithm Read-write Catrol state machine to read and write the data on ahb bus, and by described
Each working condition in host state machine determines its state transition condition to realize the burst transfer of the data on ahb bus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201821189844.5U CN208766715U (en) | 2018-07-26 | 2018-07-26 | The accelerating circuit of 3*3 convolution algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201821189844.5U CN208766715U (en) | 2018-07-26 | 2018-07-26 | The accelerating circuit of 3*3 convolution algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN208766715U true CN208766715U (en) | 2019-04-19 |
Family
ID=66129396
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201821189844.5U Withdrawn - After Issue CN208766715U (en) | 2018-07-26 | 2018-07-26 | The accelerating circuit of 3*3 convolution algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN208766715U (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108681984A (en) * | 2018-07-26 | 2018-10-19 | 珠海市微半导体有限公司 | A kind of accelerating circuit of 3*3 convolution algorithms |
CN110222818A (en) * | 2019-05-13 | 2019-09-10 | 西安交通大学 | A kind of more bank ranks intertexture reading/writing methods for the storage of convolutional neural networks data |
CN110647978A (en) * | 2019-09-05 | 2020-01-03 | 北京三快在线科技有限公司 | System and method for extracting convolution window in convolution neural network |
CN111080507A (en) * | 2019-11-18 | 2020-04-28 | 中国航空工业集团公司西安航空计算技术研究所 | TLM microstructure for GPU hardware image processing convolution filtering system |
CN111679286A (en) * | 2020-05-12 | 2020-09-18 | 珠海市一微半导体有限公司 | Laser positioning system and chip based on hardware acceleration |
CN113489925A (en) * | 2021-06-01 | 2021-10-08 | 中国科学院上海技术物理研究所 | Focal plane detector reading circuit for realizing convolution calculation |
-
2018
- 2018-07-26 CN CN201821189844.5U patent/CN208766715U/en not_active Withdrawn - After Issue
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108681984A (en) * | 2018-07-26 | 2018-10-19 | 珠海市微半导体有限公司 | A kind of accelerating circuit of 3*3 convolution algorithms |
CN108681984B (en) * | 2018-07-26 | 2023-08-15 | 珠海一微半导体股份有限公司 | Acceleration circuit of 3*3 convolution algorithm |
CN110222818A (en) * | 2019-05-13 | 2019-09-10 | 西安交通大学 | A kind of more bank ranks intertexture reading/writing methods for the storage of convolutional neural networks data |
CN110647978A (en) * | 2019-09-05 | 2020-01-03 | 北京三快在线科技有限公司 | System and method for extracting convolution window in convolution neural network |
CN111080507A (en) * | 2019-11-18 | 2020-04-28 | 中国航空工业集团公司西安航空计算技术研究所 | TLM microstructure for GPU hardware image processing convolution filtering system |
CN111080507B (en) * | 2019-11-18 | 2022-12-06 | 中国航空工业集团公司西安航空计算技术研究所 | TLM microstructure for GPU hardware image processing convolution filtering system |
CN111679286A (en) * | 2020-05-12 | 2020-09-18 | 珠海市一微半导体有限公司 | Laser positioning system and chip based on hardware acceleration |
CN111679286B (en) * | 2020-05-12 | 2022-10-14 | 珠海一微半导体股份有限公司 | Laser positioning system and chip based on hardware acceleration |
CN113489925A (en) * | 2021-06-01 | 2021-10-08 | 中国科学院上海技术物理研究所 | Focal plane detector reading circuit for realizing convolution calculation |
CN113489925B (en) * | 2021-06-01 | 2022-07-08 | 中国科学院上海技术物理研究所 | Focal plane detector reading circuit for realizing convolution calculation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN208766715U (en) | The accelerating circuit of 3*3 convolution algorithm | |
CN108681984A (en) | A kind of accelerating circuit of 3*3 convolution algorithms | |
CN102208005B (en) | 2-dimensional (2-D) convolver | |
US20180189643A1 (en) | Convolution circuit, application processor including the same, and operating method thereof | |
CN101441271B (en) | SAR real time imaging processing device based on GPU | |
CN103020890B (en) | Based on the visual processing apparatus of multi-level parallel processing | |
CN111414994B (en) | FPGA-based Yolov3 network computing acceleration system and acceleration method thereof | |
CN107392309A (en) | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA | |
CN106683158A (en) | Modeling structure of GPU texture mapping non-blocking memory Cache | |
RU2623806C1 (en) | Method and device of processing stereo images | |
GB2298111A (en) | Improvements relating to computer 3d rendering systems | |
CN107748723A (en) | Storage method and access device supporting conflict-free stepping block-by-block access | |
CN105550978B (en) | A kind of GPU 3D engine on piece memory hierarchy towards unified dyeing framework | |
CN114092338B (en) | Image zooming fast calculation method | |
CN114461978A (en) | Data processing method and device, electronic equipment and readable storage medium | |
CN114359662A (en) | Implementation method of convolutional neural network based on heterogeneous FPGA and fusion multiresolution | |
CN109446478A (en) | A kind of complex covariance matrix computing system based on iteration and restructural mode | |
CN110515872A (en) | Direct memory access method, apparatus, dedicated computing chip and heterogeneous computing system | |
CN104869284A (en) | High-efficiency FPGA implementation method and device for bilinear interpolation amplification algorithm | |
CN117217274A (en) | Vector processor, neural network accelerator, chip and electronic equipment | |
CN111275608B (en) | Remote sensing image orthorectification parallel system based on FPGA | |
CN1105358C (en) | Semiconductor memory having arithmetic function, and processor using the same | |
CN107679117A (en) | A kind of whole audience dense point Rapid matching system | |
WO2021070303A1 (en) | Computation processing device | |
CN116090530A (en) | Systolic array structure and method capable of configuring convolution kernel size and parallel calculation number |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
AV01 | Patent right actively abandoned |
Granted publication date: 20190419 Effective date of abandoning: 20230815 |
|
AV01 | Patent right actively abandoned |
Granted publication date: 20190419 Effective date of abandoning: 20230815 |
|
AV01 | Patent right actively abandoned | ||
AV01 | Patent right actively abandoned |