CN109416756A - Acoustic convolver and its applied artificial intelligence process device - Google Patents
Acoustic convolver and its applied artificial intelligence process device Download PDFInfo
- Publication number
- CN109416756A CN109416756A CN201880002156.XA CN201880002156A CN109416756A CN 109416756 A CN109416756 A CN 109416756A CN 201880002156 A CN201880002156 A CN 201880002156A CN 109416756 A CN109416756 A CN 109416756A
- Authority
- CN
- China
- Prior art keywords
- data
- row
- acoustic convolver
- convolution algorithm
- buffer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F5/00—Methods or arrangements for data conversion without changing the order or content of the data handled
- G06F5/06—Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
- G06F5/10—Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor having a sequence of storage locations each being individually accessible for both enqueue and dequeue operations, e.g. using random access memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/57—Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
A kind of acoustic convolver (100) and its applied artificial intelligence process device, are electrically connected to external memory, and the external memory is stored with pending data and weight parameter;Acoustic convolver (100) includes: parameter register (110), input buffer, convolution algorithm circuit (150) and output state (160);Parameter register (110) is for receiving and exporting the weight parameter;Input buffer includes: multiple connected row buffers, for receiving and exporting pending data;Wherein, the every output a data of each row buffer is then gathered to form column data output;Convolution algorithm circuit (150) is used to receive pending data from the input buffer, receives weight parameter from parameter register (110), carries out convolution algorithm accordingly and exports convolution algorithm result;Output state (160) is for receiving convolution operation result and exporting the convolution algorithm result to external memory.This method can solve to realize that bring processing speed is slack-off by software operation in the prior art, to the demanding problem of processor performance.
Description
Technical field
The present invention relates to processor technical fields, more particularly to artificial intelligence process device technical field, specially convolution
Device and its applied artificial intelligence process device.
Background technique
Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feedforward neural networks, it
Artificial neuron can respond the surrounding cells in a part of coverage area, have outstanding performance for large-scale image procossing.Convolution
Neural network includes convolutional layer (convolutional layer)) and pond layer (pooling layer).
Now, CNN has become one of the research hotspot of numerous scientific domains, especially in pattern classification field, due to
The network avoids the pretreatment complicated early period to image, can directly input original image, thus has obtained more extensive
Using.
Generally, the basic structure of CNN includes two layers, and one is characterized extract layer, the input of each neuron with it is previous
The local acceptance region of layer is connected, and extracts the feature of the part.After the local feature is extracted, it is between other feature
Positional relationship is also decided therewith;The second is Feature Mapping layer, each computation layer of network is made of multiple Feature Mappings, often
A Feature Mapping is a plane, and the weight of all neurons is equal in plane.Feature Mapping structure is small using influence function core
Activation primitive of the sigmoid function as convolutional network so that Feature Mapping has shift invariant.Further, since one
Neuron on mapping face shares weight, thus reduces the number of network freedom parameter.Each of convolutional neural networks
Convolutional layer all followed by one is used to ask the computation layer of local average and second extraction, this distinctive feature extraction structure twice
Reduce feature resolution.
CNN is mainly used to the X-Y scheme of identification displacement, scaling and other forms distortion invariance.Due to the feature of CNN
Detection layers are learnt by training data, so the feature extraction of display is avoided when using CNN, and implicitly from instruction
Practice and is learnt in data;Furthermore since the neuron weight on same Feature Mapping face is identical, so network can be learned parallel
It practises, this is also convolutional network is connected with each other a big advantage of network relative to neuron.Convolutional neural networks are with its local weight
Shared special construction has unique superiority in terms of speech recognition and image procossing, is laid out closer to actual life
Object neural network, the shared complexity for reducing network of weight, the especially image of multidimensional input vector can directly input net
This feature of network avoids the complexity of data reconstruction in feature extraction and assorting process.
Currently, convolutional neural networks are carried out by the software operated in a processor or multiple distributed treatments
Operation realizes that, as the complexity of convolutional neural networks increases, processing speed is opposite to be slowed down, and to the performance of processor
It is required that also higher and higher.
Summary of the invention
In view of the foregoing deficiencies of prior art, the purpose of the present invention is to provide acoustic convolvers and its applied artificial
Intelligent treatment device is to realize bring processing speed by software operation for solving convolutional neural networks in the prior art
It is slack-off, to the demanding problem of processor performance.
In order to achieve the above objects and other related objects, the present invention provides a kind of acoustic convolver, is electrically connected to external storage
Device, wherein the external memory is stored with pending data and weight parameter;The acoustic convolver includes: parameter register, defeated
Enter buffer, convolution algorithm circuit and output state;The parameter register is for receiving and exporting the weight parameter;Institute
Stating input buffer includes: multiple connected row buffers, for receiving and exporting the pending data;Wherein, each described
The every output a data of row buffer is then gathered to form column data output;The convolution algorithm circuit is used for slow from the input
Storage receives the pending data, receives weight parameter from the parameter register, carries out convolution algorithm accordingly and exports volume
Product operation result;The output state is for receiving the convolution algorithm result and by the convolution algorithm result to the outside
Memory output.
In one embodiment of the invention, the input buffer includes: the first row buffer, is received by turn to be processed
The pixel data of characteristic spectrum through exporting row pixel data simultaneously after wave filter, and stores the institute of each convolutional layer of input
State characteristic spectrum;Wherein, the data amount check of the every row of row pixel is parallel filter quantity.
In one embodiment of the invention, first row buffer is sequentially output the row pixel number of each convolutional layer
According to, and it is sequentially output when exporting each described convolutional layer row pixel data the row pixel data of each channel data.
In one embodiment of the invention, the input buffer further include: at least one second row buffer, comprising more
A concatenated FIFO memory, the one-row pixels data of each FIFO memory storage characteristic spectrum;Wherein, each described
Row pixel data is successively stored along the path that concatenated FIFO memory is formed to each FIFO memory;Second row is slow
Storage exports Pf × K matrix form pixel data;Wherein, Pf is parallel filter quantity, and K is the size of convolution kernel.
In one embodiment of the invention, the input buffer further include: at least one matrix buffer, each institute
It is Pf × K that matrix buffer, which is stated, by being arranged in matrix for storing multiple registers of pixel data, the size of the register
× 2, when the columns of the pixel data of input is greater than K, the matrix buffer exports Pf × K × K matrix form pixel number
According to.
In one embodiment of the invention, the convolution algorithm circuit includes: multiple convolution kernels run parallel, each
The convolution kernel includes the multiplier for carrying out convolution algorithm;Adder tree, to the output results of multiple multipliers into
Row is cumulative;Each described acoustic convolver inputs K × K matrix form pixel data, according to the pixel data of input and the power
Weight parameter passes through convolution algorithm output pixel data by turn.
In one embodiment of the invention, the output state includes: two parallel FIFO memories, and process is same
After the channel data of a filter is accumulated in the same FIFO memory of deposit;Data selector, for that will tire out every time
The result added is back to the adder tree until the adder exports final accumulation result.
In one embodiment of the invention, the acoustic convolver further include: pond computing circuit is connected to the output caching
Between device and the external memory, exported for carrying out Chi Huahou to the convolution algorithm result to external memory.
In one embodiment of the invention, between each internal part included by the acoustic convolver and the acoustic convolver
It is connect between the external memory by first in, first out data-interface.
The present invention also provides a kind of artificial intelligence process device, the artificial intelligence process device includes volume as described above
Product device.
As described above, acoustic convolver and its applied artificial intelligence process device of the invention, have the advantages that
Acoustic convolver of the invention is by parameter register, input buffer, convolution algorithm circuit, output state, Chi Huayun
Calculate the hardware such as circuit and first in, first out data-interface composition, can the high convolutional neural networks algorithm of high speed processing complexity, can
It is demanding to processor performance to ask effectively to solve to realize that bring processing speed is slack-off by software operation in the prior art
Topic.
Detailed description of the invention
Fig. 1 is shown as a kind of whole schematic illustration of acoustic convolver in the prior art.
Fig. 2 is shown as a kind of input and output schematic diagram of acoustic convolver of the invention.
Fig. 3 is shown as the schematic diagram of the second row buffer in a kind of acoustic convolver of the invention.
Fig. 4 is shown as the input and output schematic diagram of matrix buffer in a kind of acoustic convolver of the invention.
Fig. 5 is shown as the structural schematic diagram of output state in a kind of acoustic convolver of the invention.
Component label instructions
100 acoustic convolvers
110 parameter registers
120 first row buffers
130 second row buffers
140 matrix buffers
150 convolution algorithm circuits
160 output states
170 pond computing circuits
Specific embodiment
Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification
Other advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realities
The mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing from
Various modifications or alterations are carried out under spirit of the invention.It should be noted that in the absence of conflict, following embodiment and implementation
Feature in example can be combined with each other.
It should be noted that as shown in Figures 1 to 5, diagram provided in following embodiment only illustrates in a schematic way
Basic conception of the invention, only shown in schema then with related component in the present invention rather than package count when according to actual implementation
Mesh, shape and size are drawn, when actual implementation kenel, quantity and the ratio of each component can arbitrarily change for one kind, and its
Assembly layout kenel may also be increasingly complex.
The purpose of the present embodiment is that a kind of acoustic convolver and its applied artificial intelligence process device are provided, for solving
Convolutional neural networks are to realize that bring processing speed is slack-off by software operation in the prior art, to processor performance requirement
High problem.A kind of acoustic convolver of the present embodiment described in detail below and its principle of applied artificial intelligence process device
And embodiment, so that those skilled in the art is not needed a kind of acoustic convolver and its institute that creative work is appreciated that the present embodiment
The artificial intelligence process device of application.
Specifically, as shown in Figure 1, the present embodiment provides a kind of acoustic convolver 100, the acoustic convolver 100 is electrically connected to outer
Portion's memory, wherein the external memory is stored with pending data and weight parameter;The acoustic convolver 100 includes: parameter
Buffer 110, input buffer, convolution algorithm circuit 150 and output state 160.
First pending data includes multiple channel datas;First weight parameter includes multilayer subparameter, often
Straton parameter corresponds each channel data respectively;The convolution algorithm circuit 150 have it is multiple, for parallel correspondingly
Calculate the convolution algorithm result of each channel data.
In this present embodiment, the parameter register 110 (Con_reg shown in Fig. 2) is described for receiving and exporting
Weight parameter (Weight shown in Fig. 2).The parameter register 110 includes a FIFO memory, and the weight parameter is deposited
It is stored in the FIFO memory.Wherein, the parameter in input buffer, convolution algorithm circuit 150 and output state 160 is equal
It is also stored in the parameter register 110 after configuration is good.
In this present embodiment, the input buffer includes: multiple connected row buffers, described for receiving and exporting
Pending data;Wherein, the every output a data of each row buffer is then gathered to form column data output.
The input buffer includes the first row buffer 120 (Conv_in_cache shown in Fig. 2), and the second row is slow
Storage 130 (Conv_in_buffer shown in Fig. 2) and (Con_in_ shown in Fig. 2 of matrix buffer 140
matrix).First row buffer 120 (Conv_in_cache shown in Fig. 2), the second row buffer 130 are (shown in Fig. 2
Conv_in_buffer) and matrix buffer 140 (Con_in_matrix shown in Fig. 2) is used for 1*1 pixel data
Input carries out processing output Pv*K2Pixel data.Wherein, Pv is row vector, and K is the size of convolution kernel.Below to the input
Buffer is described in detail.
Specifically, in this present embodiment, first row buffer 120 (Conv_in_cache shown in Fig. 2) is by turn
The pixel data for receiving characteristic spectrum to be processed through exporting row pixel data simultaneously after wave filter, and stores input
The characteristic spectrum of each convolutional layer;Wherein, the data amount check of the every row of row pixel is parallel filter quantity.
In this present embodiment, first row buffer 120 includes a BRAM, and the characteristic spectrum of each convolutional layer inputs picture
Prime number is stored according to by buffered BRAM with improving the localization of pixel data.
Wherein, in this present embodiment, first row buffer 120 is sequentially output the row pixel number of each convolutional layer
According to, and it is sequentially output when exporting each described convolutional layer row pixel data the row pixel data of each channel data.It is i.e. described
First row buffer 120 exports the pixel data in first channel when starting, when the pixel data to first channel is handled
After the completion, first row buffer 120 starts to export the pixel data in second channel, when all channels of a convolutional layer
Pixel data all export after, carry out the channel of next convolutional layer pixel data output.Wherein, first row buffer
120 can be iterated calculating output using different filters from first convolutional layer to a last convolutional layer.
In this present embodiment, the input buffer further include: at least one second row buffer 130, as shown in figure 3,
Second row buffer 130 includes multiple concatenated FIFO memories, each described FIFO memory stores characteristic spectrum
One-row pixels data;Wherein, each row pixel data is successively stored along the path that concatenated FIFO memory is formed to each
The FIFO memory;Second row buffer 130 receives Pf row pixel data, exports Pf × K matrix form pixel
Data;Wherein, Pf is parallel filter quantity, and K is the size of convolution kernel.
The first row pixel data is stored in first FIFO, and first FIFO starts to receive the second row pixel data, and
The first row pixel data is output in second FIFO.In this way, two FIFO will store two continuous row pixel datas simultaneously
They are exported.
In this present embodiment, the input buffer further include: at least one matrix buffer 140, each described square
Battle array buffer 140 by being arranged in matrix for storing multiple registers of pixel data, the size of the register be Pf × K ×
2, as shown in figure 4, register when being shown as K=3.The matrix buffer 140 inputs Pf × K matrix form pixel number
According to when the columns of the pixel data of input is greater than K, the matrix buffer 140 exports Pf × K × K matrix form pixel
Data.
In this present embodiment, the convolution algorithm circuit 150 is used to receive the number to be processed from the input buffer
According to, from the parameter register 110 receive weight parameter, carry out convolution algorithm and exporting convolution algorithm result accordingly.
Specifically, in this present embodiment, the convolution algorithm circuit 150 includes: multiple convolution kernels run parallel, each
A convolution kernel includes the multiplier for carrying out convolution algorithm;Adder tree, to the output result of multiple multipliers
It adds up;Each described acoustic convolver 100 inputs K × K matrix form pixel data, according to the pixel data of input and institute
It states weight parameter and passes through convolution algorithm output pixel data by turn.
For example, image has tri- channel datas of R, G, B, i.e. three two-dimensional matrixes, it is assumed that the first weight parameter is
The depth of filter is 3, that is, has three straton weight parameters, i.e. three two-dimensional matrixes, each length and width are set as K*K, it is assumed that K is odd
Number 3, respectively with three Chanel convolution algorithms, when from the first pending data take out Pv*k*3 a data cube (Pv >
K), it is assumed that Pv is 5, then the filter will with the data cube by convolution algorithm circuit 150 three times could operation finish, and
Preferably, convolution algorithm circuit 150 can be equipped with 3 of corresponding number, carry out so as to parallel within a clock cycle
The convolution algorithm of respective be responsible for Channel.
In this present embodiment, the output state 160 is for receiving the convolution algorithm result and by the convolution algorithm
As a result it is exported to the external memory.
Specifically, the output state 160 receives the convolution algorithm in each channel as a result, all channels of then adding up
As a result the convolution algorithm of data is as a result, be temporarily stored in the output state 160.
Specifically, in this present embodiment, as shown in figure 5, the output state 160 includes: that two parallel FIFO are deposited
Reservoir is stored in the same FIFO memory after the channel data of the same filter is accumulated;Data selector
(MUX), for each accumulated result to be back to the adder tree until the adder exports final accumulation result.
Wherein, the quantity of the adder is equal to Pf*Pv, in addition, data selector (MUX) is also used to speed data stream
It is down to 1*1, a pixel pixel output.
In this present embodiment, the acoustic convolver 100 further include: pond computing circuit 170 is connected to the output state
Between 160 and the external memory, exported for carrying out Chi Huahou to the convolution algorithm result to external memory.
The pond computing circuit 170 provides maximum pond, the pond computing circuit 170 for every two rows pixel data
Comprising a FIFO memory, for storing number of pels per line evidence.
Specifically, pond mode can be Max pooling, it is also possible to Average pooling, it can be by patrolling
Circuit is collected to realize.
In this present embodiment, between each internal part included by the acoustic convolver 100 and the acoustic convolver 100 with
It is connected between the external memory by first in, first out data-interface (multiple SIF shown in Fig. 2).
Specifically, the first in, first out data-interface includes: pushup storage, the first logic unit and the second logic
Unit.
Wherein, the pushup storage includes: writeable enable pin, data in pin and the memory of uplink
Full status indicator pin;And readable enable pin, data out pin and the memory dummy status identification pine of downlink;
The first logic unit connection uplink object, the writeable enable pin and memory expire status indicator pin,
For when receiving the write request of uplink object, the signal on status indicator pin is expired according to memory determines and described first enter elder generation
Whether memory has expired out;Enable pushup storage writeable to writeable enable pin if non-full, sending enable signal;It is no
Then, enable the pushup storage not writeable.
Specifically, first logic unit includes: the first reverser, and input terminal, which connects the memory, expires state mark
Know pin, output end draws the first identifier end for connecting uplink object;First and door, first input end connection described the
One data effectively identify end, and the second input terminal is connected to the upstream data live end for connecting uplink object, and output end connects
Connect the writeable enable pin.
The second logic unit connection downlink object, the readable enable pin and memory dummy status identification pine,
For when receiving the read request of downlink object, is determined according to the signal on memory dummy status identification pine and described first enter elder generation
Whether memory is empty out;If not empty, enable signal is sent to readable enable pin to enable pushup storage readable;It is no
Then, enable the pushup storage unreadable.
Specifically, second logic unit includes: the second reverser, and input terminal connects the memory dummy status mark
Know pin, output end draws the downlink data live end for connecting downlink object;Second and door, first input end connects institute
Downlink data live end is stated, the second input terminal, which is connected to, effectively identifies end for the downlink data for connecting downlink object.
In the present embodiment, the operational process of acoustic convolver 100 is as follows:
Pending data is read from external memory by first in, first out data-interface (multiple SIF shown in Fig. 2), and
It stores to the BRAM of the first row buffer 120 (Conv_in_cache shown in Fig. 2).
The weight parameter of k*k is read from external memory by first in, first out data-interface (multiple SIF shown in Fig. 2)
(channel) is then stored to the parameter register 110.
Once the parameter register 110 is loaded into a weight parameter, start the pixel number for receiving processing characteristic spectrum
According to by the first row buffer 120 (Conv_in_cache shown in Fig. 2), the second row buffer 130 is (shown in Fig. 2
Conv_in_buffer) and the processing of matrix buffer 140 (Con_in_matrix shown in Fig. 2), the convolution algorithm
The per clock cycle of circuit 150 receives Pv*K2Pixel data.
By the convolution algorithm circuit 150 to each channel (a height of H of the characteristic spectrum of each channel input and
Width is W) input data carry out that convolution is cumulative, then export the result in each channel to the output state 160.
The different input channel of cyclic access, the add up data result in each channel of the output state 160 are known
Getting with filter is the corresponding characteristic spectrum of (H-K+1) * (W-K+1) size.
Then it can use reception (H-K+1) * (W-K+1) pixel data of pond computing circuit 170 to do after pondization is handled again
Characteristic spectrum is exported, directly can also export characteristic spectrum from the output state 160.
When the pond computing circuit 170 or the output state 160 export the feature Jing Guo a filter process
After map, the parameter register 110 is re-loaded to a weight parameter, above-mentioned by different filter iterations
Processes pixel process, until completing the processes pixel of all convolutional layers.
The present embodiment also provides a kind of artificial intelligence process device, and the artificial intelligence process device includes as described above
Acoustic convolver 100.Above-mentioned that the acoustic convolver 100 is described in detail, details are not described herein.
Wherein, the artificial intelligence process device, comprising: programmable logic circuit (PL) and processing system circuit (PS).Institute
Stating processing system circuit includes central processing unit, can be realized by MCU, SoC, FPGA or DSP etc., such as ARM framework is embedding
Enter formula processor chips etc.;The central processing unit and external memory communicate to connect, and the external memory 200 is, for example,
RAM or ROM memory, such as three generations, four generation DDR SDRAM etc.;The central processing unit can be to external memory read/write data.
In conclusion acoustic convolver of the invention is cached by parameter register, input buffer, convolution algorithm circuit, output
Device, the hardware such as pond computing circuit and first in, first out data-interface composition, can the high convolutional Neural net of high speed processing complexity
Network algorithm can effectively solve to realize that bring processing speed is slack-off by software operation in the prior art, to processor performance
Demanding problem.So the present invention effectively overcomes various shortcoming in the prior art and has high industrial utilization value.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe
The personage for knowing this technology all without departing from the spirit and scope of the present invention, carries out modifications and changes to above-described embodiment.Cause
This, institute is complete without departing from the spirit and technical ideas disclosed in the present invention by those of ordinary skill in the art such as
At all equivalent modifications or change, should be covered by the claims of the present invention.
Claims (10)
1. a kind of acoustic convolver, is electrically connected to external memory, wherein the external memory is stored with pending data and power
Weight parameter;It is characterized in that, the acoustic convolver includes: parameter register, input buffer, convolution algorithm circuit and output caching
Device;
The parameter register is for receiving and exporting the weight parameter;
The input buffer includes: multiple connected row buffers, for receiving and exporting the pending data;Wherein,
Each every output a data of the row buffer is then gathered to form column data output;
The convolution algorithm circuit is used to receive the pending data from the input buffer, connect from the parameter register
Weight parameter is received, carry out convolution algorithm accordingly and exports convolution algorithm result;
The output state is for receiving the convolution algorithm result and by the convolution algorithm result to the external memory
Output.
2. acoustic convolver according to claim 1, which is characterized in that the input buffer includes:
First row buffer receives the pixel data of characteristic spectrum to be processed by turn, through exporting row after wave filter simultaneously
Pixel data, and store the characteristic spectrum of each convolutional layer of input;Wherein, the data amount check of the every row of row pixel is parallel
Filter quantity.
3. acoustic convolver according to claim 2, which is characterized in that first row buffer is sequentially output each convolution
The row pixel data of layer, and it is sequentially output when exporting each described convolutional layer row pixel data the row pixel of each channel data
Data.
4. acoustic convolver according to claim 2, which is characterized in that the input buffer further include:
At least one second row buffer includes multiple concatenated FIFO memories, each described FIFO memory storage is special
Levy the one-row pixels data of map;Wherein, each row pixel data is successively deposited along the path that concatenated FIFO memory is formed
It stores up to each FIFO memory;Second row buffer exports Pf × K matrix form pixel data;Wherein, Pf is simultaneously
Capable filter quantity, K are the size of convolution kernel.
5. acoustic convolver according to claim 4, which is characterized in that the input buffer further include:
At least one matrix buffer, each described matrix buffer is by being arranged in matrix for storing the multiple of pixel data
Register, the size of the register are Pf × K × 2, when the columns of the pixel data of input is greater than K, the matrix caching
Device exports Pf × K × K matrix form pixel data.
6. acoustic convolver according to claim 5, which is characterized in that the convolution algorithm circuit includes:
Multiple convolution kernels run parallel, each described convolution kernel include the multiplier for carrying out convolution algorithm;
Adder tree adds up to the output result of multiple multipliers;
Each described acoustic convolver inputs K × K matrix form pixel data, is joined according to the pixel data of input and the weight
Number passes through convolution algorithm output pixel data by turn.
7. acoustic convolver according to claim 6, which is characterized in that the output state includes:
Two parallel FIFO memories, are stored in the same FIFO after the channel data of the same filter is accumulated
In memory;
Data selector, for each accumulated result to be back to the adder tree until the adder exports finally
Accumulation result.
8. acoustic convolver according to claim 1, which is characterized in that the acoustic convolver further include:
Pond computing circuit is connected between the output state and the external memory, for the convolution algorithm
As a result Chi Huahou is carried out to export to external memory.
9. acoustic convolver according to claim 1, which is characterized in that between each internal part included by the acoustic convolver,
And it is connected between the acoustic convolver and the external memory by first in, first out data-interface.
10. a kind of artificial intelligence process device, which is characterized in that the artificial intelligence process device include as claim 1 to
Acoustic convolver described in claim 9 any claim.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/072678 WO2019136764A1 (en) | 2018-01-15 | 2018-01-15 | Convolutor and artificial intelligent processing device applied thereto |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109416756A true CN109416756A (en) | 2019-03-01 |
Family
ID=65462114
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201880002156.XA Pending CN109416756A (en) | 2018-01-15 | 2018-01-15 | Acoustic convolver and its applied artificial intelligence process device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109416756A (en) |
WO (1) | WO2019136764A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978161A (en) * | 2019-03-08 | 2019-07-05 | 吉林大学 | A kind of general convolution-pond synchronization process convolution kernel system |
CN109992225A (en) * | 2019-04-04 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Data output method and relevant apparatus |
CN110866597A (en) * | 2019-09-27 | 2020-03-06 | 珠海博雅科技有限公司 | Data processing circuit and data processing method |
CN111814675A (en) * | 2020-07-08 | 2020-10-23 | 上海雪湖科技有限公司 | Convolutional neural network characteristic diagram assembling system based on FPGA supporting dynamic resolution |
CN112101178A (en) * | 2020-09-10 | 2020-12-18 | 电子科技大学 | Intelligent SOC terminal assisting blind people in perceiving external environment |
WO2021072732A1 (en) * | 2019-10-18 | 2021-04-22 | 北京希姆计算科技有限公司 | Matrix computing circuit, apparatus and method |
CN112784973A (en) * | 2019-11-04 | 2021-05-11 | 北京希姆计算科技有限公司 | Convolution operation circuit, device and method |
WO2022095632A1 (en) * | 2020-11-06 | 2022-05-12 | 苏州浪潮智能科技有限公司 | Method and apparatus for implementing data convolution operation on basis of fpga, and medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110727633A (en) * | 2019-09-17 | 2020-01-24 | 广东高云半导体科技股份有限公司 | Edge artificial intelligence computing system framework based on SoC FPGA |
CN111047010A (en) * | 2019-11-25 | 2020-04-21 | 天津大学 | Method and device for reducing first-layer convolution calculation delay of CNN accelerator |
TWI766568B (en) * | 2020-04-17 | 2022-06-01 | 神盾股份有限公司 | Processing device for executing convolution neural network computation and operation method thereof |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915322A (en) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof |
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
US20160379109A1 (en) * | 2015-06-29 | 2016-12-29 | Microsoft Technology Licensing, Llc | Convolutional neural networks on hardware accelerators |
EP3153996A2 (en) * | 2015-10-07 | 2017-04-12 | Altera Corporation | Method and apparatus for implementing layers on a convolutional neural network accelerator |
CN106970896A (en) * | 2017-03-30 | 2017-07-21 | 中国人民解放军国防科学技术大学 | The vectorization implementation method of the two-dimensional matrix convolution of vector processor-oriented |
US20170236053A1 (en) * | 2015-12-29 | 2017-08-17 | Synopsys, Inc. | Configurable and Programmable Multi-Core Architecture with a Specialized Instruction Set for Embedded Application Based on Neural Networks |
CN107229967A (en) * | 2016-08-22 | 2017-10-03 | 北京深鉴智能科技有限公司 | A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA |
CN107392309A (en) * | 2017-09-11 | 2017-11-24 | 东南大学—无锡集成电路技术研究所 | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA |
CN107454966A (en) * | 2015-05-21 | 2017-12-08 | 谷歌公司 | Weight is prefetched for neural network processor |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017129325A1 (en) * | 2016-01-29 | 2017-08-03 | Fotonation Limited | A convolutional neural network |
US10497089B2 (en) * | 2016-01-29 | 2019-12-03 | Fotonation Limited | Convolutional neural network |
GB201607713D0 (en) * | 2016-05-03 | 2016-06-15 | Imagination Tech Ltd | Convolutional neural network |
-
2018
- 2018-01-15 WO PCT/CN2018/072678 patent/WO2019136764A1/en active Application Filing
- 2018-01-15 CN CN201880002156.XA patent/CN109416756A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107454966A (en) * | 2015-05-21 | 2017-12-08 | 谷歌公司 | Weight is prefetched for neural network processor |
CN104915322A (en) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof |
US20160379109A1 (en) * | 2015-06-29 | 2016-12-29 | Microsoft Technology Licensing, Llc | Convolutional neural networks on hardware accelerators |
WO2017003887A1 (en) * | 2015-06-29 | 2017-01-05 | Microsoft Technology Licensing, Llc | Convolutional neural networks on hardware accelerators |
EP3153996A2 (en) * | 2015-10-07 | 2017-04-12 | Altera Corporation | Method and apparatus for implementing layers on a convolutional neural network accelerator |
US20170236053A1 (en) * | 2015-12-29 | 2017-08-17 | Synopsys, Inc. | Configurable and Programmable Multi-Core Architecture with a Specialized Instruction Set for Embedded Application Based on Neural Networks |
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
CN107229967A (en) * | 2016-08-22 | 2017-10-03 | 北京深鉴智能科技有限公司 | A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA |
CN106970896A (en) * | 2017-03-30 | 2017-07-21 | 中国人民解放军国防科学技术大学 | The vectorization implementation method of the two-dimensional matrix convolution of vector processor-oriented |
CN107392309A (en) * | 2017-09-11 | 2017-11-24 | 东南大学—无锡集成电路技术研究所 | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA |
Non-Patent Citations (3)
Title |
---|
HONGXIANG FAN等: "F-C3D: FPGA-based 3-Dimensional Convolutional Neural Network", 《2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL)》 * |
LI DU等: "A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS》 * |
陆志坚: "基于FPGA的卷积神经网络并行结构研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978161A (en) * | 2019-03-08 | 2019-07-05 | 吉林大学 | A kind of general convolution-pond synchronization process convolution kernel system |
CN109978161B (en) * | 2019-03-08 | 2022-03-04 | 吉林大学 | Universal convolution-pooling synchronous processing convolution kernel system |
CN109992225A (en) * | 2019-04-04 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Data output method and relevant apparatus |
CN109992225B (en) * | 2019-04-04 | 2022-02-22 | 中科寒武纪科技股份有限公司 | Data output method and related device |
CN110866597A (en) * | 2019-09-27 | 2020-03-06 | 珠海博雅科技有限公司 | Data processing circuit and data processing method |
WO2021072732A1 (en) * | 2019-10-18 | 2021-04-22 | 北京希姆计算科技有限公司 | Matrix computing circuit, apparatus and method |
CN112784973A (en) * | 2019-11-04 | 2021-05-11 | 北京希姆计算科技有限公司 | Convolution operation circuit, device and method |
CN111814675A (en) * | 2020-07-08 | 2020-10-23 | 上海雪湖科技有限公司 | Convolutional neural network characteristic diagram assembling system based on FPGA supporting dynamic resolution |
CN111814675B (en) * | 2020-07-08 | 2023-09-29 | 上海雪湖科技有限公司 | Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA |
CN112101178A (en) * | 2020-09-10 | 2020-12-18 | 电子科技大学 | Intelligent SOC terminal assisting blind people in perceiving external environment |
CN112101178B (en) * | 2020-09-10 | 2023-03-24 | 电子科技大学 | Intelligent SOC terminal assisting blind people in perceiving external environment |
WO2022095632A1 (en) * | 2020-11-06 | 2022-05-12 | 苏州浪潮智能科技有限公司 | Method and apparatus for implementing data convolution operation on basis of fpga, and medium |
Also Published As
Publication number | Publication date |
---|---|
WO2019136764A1 (en) | 2019-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109416756A (en) | Acoustic convolver and its applied artificial intelligence process device | |
US11720523B2 (en) | Performing concurrent operations in a processing element | |
CN110458279B (en) | FPGA-based binary neural network acceleration method and system | |
JP6960700B2 (en) | Multicast Network On-Chip Convolutional Neural Network Hardware Accelerator and Its Behavior | |
CN107704923A (en) | Convolutional neural networks computing circuit | |
CN107341544A (en) | A kind of reconfigurable accelerator and its implementation based on divisible array | |
CN108733348B (en) | Fused vector multiplier and method for performing operation using the same | |
CN108108809A (en) | A kind of hardware structure and its method of work that acceleration is made inferences for convolutional Neural metanetwork | |
CN106022468A (en) | Artificial neural network processor integrated circuit and design method therefor | |
CN110348574A (en) | A kind of general convolutional neural networks accelerating structure and design method based on ZYNQ | |
Dundar et al. | Memory access optimized routing scheme for deep networks on a mobile coprocessor | |
CN109496319A (en) | Artificial intelligence process device hardware optimization method, system, storage medium, terminal | |
CN109416755A (en) | Artificial intelligence method for parallel processing, device, readable storage medium storing program for executing and terminal | |
CN111210019A (en) | Neural network inference method based on software and hardware cooperative acceleration | |
WO2023123919A1 (en) | Data processing circuit, data processing method, and related product | |
CN109740619B (en) | Neural network terminal operation method and device for target recognition | |
Chang et al. | VSCNN: Convolution neural network accelerator with vector sparsity | |
CN108256640A (en) | Convolutional neural networks implementation method | |
CN110059809A (en) | A kind of computing device and Related product | |
CN110178146A (en) | Deconvolution device and its applied artificial intelligence process device | |
CN214586992U (en) | Neural network accelerating circuit, image processor and three-dimensional imaging electronic equipment | |
CN108256637A (en) | A kind of cellular array three-dimensional communication transmission method | |
CN109740729A (en) | Operation method, device and Related product | |
CN109416743A (en) | A kind of Three dimensional convolution device artificially acted for identification | |
CN109542513A (en) | A kind of convolutional neural networks instruction data storage system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190301 |