CN109416756A - Acoustic convolver and its applied artificial intelligence process device - Google Patents

Acoustic convolver and its applied artificial intelligence process device Download PDF

Info

Publication number
CN109416756A
CN109416756A CN201880002156.XA CN201880002156A CN109416756A CN 109416756 A CN109416756 A CN 109416756A CN 201880002156 A CN201880002156 A CN 201880002156A CN 109416756 A CN109416756 A CN 109416756A
Authority
CN
China
Prior art keywords
data
row
acoustic convolver
convolution algorithm
buffer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201880002156.XA
Other languages
Chinese (zh)
Inventor
肖梦秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Corerain Technologies Co Ltd
Original Assignee
Shenzhen Corerain Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Corerain Technologies Co Ltd filed Critical Shenzhen Corerain Technologies Co Ltd
Publication of CN109416756A publication Critical patent/CN109416756A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • G06F5/10Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor having a sequence of storage locations each being individually accessible for both enqueue and dequeue operations, e.g. using random access memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

A kind of acoustic convolver (100) and its applied artificial intelligence process device, are electrically connected to external memory, and the external memory is stored with pending data and weight parameter;Acoustic convolver (100) includes: parameter register (110), input buffer, convolution algorithm circuit (150) and output state (160);Parameter register (110) is for receiving and exporting the weight parameter;Input buffer includes: multiple connected row buffers, for receiving and exporting pending data;Wherein, the every output a data of each row buffer is then gathered to form column data output;Convolution algorithm circuit (150) is used to receive pending data from the input buffer, receives weight parameter from parameter register (110), carries out convolution algorithm accordingly and exports convolution algorithm result;Output state (160) is for receiving convolution operation result and exporting the convolution algorithm result to external memory.This method can solve to realize that bring processing speed is slack-off by software operation in the prior art, to the demanding problem of processor performance.

Description

Acoustic convolver and its applied artificial intelligence process device
Technical field
The present invention relates to processor technical fields, more particularly to artificial intelligence process device technical field, specially convolution Device and its applied artificial intelligence process device.
Background technique
Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feedforward neural networks, it Artificial neuron can respond the surrounding cells in a part of coverage area, have outstanding performance for large-scale image procossing.Convolution Neural network includes convolutional layer (convolutional layer)) and pond layer (pooling layer).
Now, CNN has become one of the research hotspot of numerous scientific domains, especially in pattern classification field, due to The network avoids the pretreatment complicated early period to image, can directly input original image, thus has obtained more extensive Using.
Generally, the basic structure of CNN includes two layers, and one is characterized extract layer, the input of each neuron with it is previous The local acceptance region of layer is connected, and extracts the feature of the part.After the local feature is extracted, it is between other feature Positional relationship is also decided therewith;The second is Feature Mapping layer, each computation layer of network is made of multiple Feature Mappings, often A Feature Mapping is a plane, and the weight of all neurons is equal in plane.Feature Mapping structure is small using influence function core Activation primitive of the sigmoid function as convolutional network so that Feature Mapping has shift invariant.Further, since one Neuron on mapping face shares weight, thus reduces the number of network freedom parameter.Each of convolutional neural networks Convolutional layer all followed by one is used to ask the computation layer of local average and second extraction, this distinctive feature extraction structure twice Reduce feature resolution.
CNN is mainly used to the X-Y scheme of identification displacement, scaling and other forms distortion invariance.Due to the feature of CNN Detection layers are learnt by training data, so the feature extraction of display is avoided when using CNN, and implicitly from instruction Practice and is learnt in data;Furthermore since the neuron weight on same Feature Mapping face is identical, so network can be learned parallel It practises, this is also convolutional network is connected with each other a big advantage of network relative to neuron.Convolutional neural networks are with its local weight Shared special construction has unique superiority in terms of speech recognition and image procossing, is laid out closer to actual life Object neural network, the shared complexity for reducing network of weight, the especially image of multidimensional input vector can directly input net This feature of network avoids the complexity of data reconstruction in feature extraction and assorting process.
Currently, convolutional neural networks are carried out by the software operated in a processor or multiple distributed treatments Operation realizes that, as the complexity of convolutional neural networks increases, processing speed is opposite to be slowed down, and to the performance of processor It is required that also higher and higher.
Summary of the invention
In view of the foregoing deficiencies of prior art, the purpose of the present invention is to provide acoustic convolvers and its applied artificial Intelligent treatment device is to realize bring processing speed by software operation for solving convolutional neural networks in the prior art It is slack-off, to the demanding problem of processor performance.
In order to achieve the above objects and other related objects, the present invention provides a kind of acoustic convolver, is electrically connected to external storage Device, wherein the external memory is stored with pending data and weight parameter;The acoustic convolver includes: parameter register, defeated Enter buffer, convolution algorithm circuit and output state;The parameter register is for receiving and exporting the weight parameter;Institute Stating input buffer includes: multiple connected row buffers, for receiving and exporting the pending data;Wherein, each described The every output a data of row buffer is then gathered to form column data output;The convolution algorithm circuit is used for slow from the input Storage receives the pending data, receives weight parameter from the parameter register, carries out convolution algorithm accordingly and exports volume Product operation result;The output state is for receiving the convolution algorithm result and by the convolution algorithm result to the outside Memory output.
In one embodiment of the invention, the input buffer includes: the first row buffer, is received by turn to be processed The pixel data of characteristic spectrum through exporting row pixel data simultaneously after wave filter, and stores the institute of each convolutional layer of input State characteristic spectrum;Wherein, the data amount check of the every row of row pixel is parallel filter quantity.
In one embodiment of the invention, first row buffer is sequentially output the row pixel number of each convolutional layer According to, and it is sequentially output when exporting each described convolutional layer row pixel data the row pixel data of each channel data.
In one embodiment of the invention, the input buffer further include: at least one second row buffer, comprising more A concatenated FIFO memory, the one-row pixels data of each FIFO memory storage characteristic spectrum;Wherein, each described Row pixel data is successively stored along the path that concatenated FIFO memory is formed to each FIFO memory;Second row is slow Storage exports Pf × K matrix form pixel data;Wherein, Pf is parallel filter quantity, and K is the size of convolution kernel.
In one embodiment of the invention, the input buffer further include: at least one matrix buffer, each institute It is Pf × K that matrix buffer, which is stated, by being arranged in matrix for storing multiple registers of pixel data, the size of the register × 2, when the columns of the pixel data of input is greater than K, the matrix buffer exports Pf × K × K matrix form pixel number According to.
In one embodiment of the invention, the convolution algorithm circuit includes: multiple convolution kernels run parallel, each The convolution kernel includes the multiplier for carrying out convolution algorithm;Adder tree, to the output results of multiple multipliers into Row is cumulative;Each described acoustic convolver inputs K × K matrix form pixel data, according to the pixel data of input and the power Weight parameter passes through convolution algorithm output pixel data by turn.
In one embodiment of the invention, the output state includes: two parallel FIFO memories, and process is same After the channel data of a filter is accumulated in the same FIFO memory of deposit;Data selector, for that will tire out every time The result added is back to the adder tree until the adder exports final accumulation result.
In one embodiment of the invention, the acoustic convolver further include: pond computing circuit is connected to the output caching Between device and the external memory, exported for carrying out Chi Huahou to the convolution algorithm result to external memory.
In one embodiment of the invention, between each internal part included by the acoustic convolver and the acoustic convolver It is connect between the external memory by first in, first out data-interface.
The present invention also provides a kind of artificial intelligence process device, the artificial intelligence process device includes volume as described above Product device.
As described above, acoustic convolver and its applied artificial intelligence process device of the invention, have the advantages that
Acoustic convolver of the invention is by parameter register, input buffer, convolution algorithm circuit, output state, Chi Huayun Calculate the hardware such as circuit and first in, first out data-interface composition, can the high convolutional neural networks algorithm of high speed processing complexity, can It is demanding to processor performance to ask effectively to solve to realize that bring processing speed is slack-off by software operation in the prior art Topic.
Detailed description of the invention
Fig. 1 is shown as a kind of whole schematic illustration of acoustic convolver in the prior art.
Fig. 2 is shown as a kind of input and output schematic diagram of acoustic convolver of the invention.
Fig. 3 is shown as the schematic diagram of the second row buffer in a kind of acoustic convolver of the invention.
Fig. 4 is shown as the input and output schematic diagram of matrix buffer in a kind of acoustic convolver of the invention.
Fig. 5 is shown as the structural schematic diagram of output state in a kind of acoustic convolver of the invention.
Component label instructions
100 acoustic convolvers
110 parameter registers
120 first row buffers
130 second row buffers
140 matrix buffers
150 convolution algorithm circuits
160 output states
170 pond computing circuits
Specific embodiment
Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification Other advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realities The mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing from Various modifications or alterations are carried out under spirit of the invention.It should be noted that in the absence of conflict, following embodiment and implementation Feature in example can be combined with each other.
It should be noted that as shown in Figures 1 to 5, diagram provided in following embodiment only illustrates in a schematic way Basic conception of the invention, only shown in schema then with related component in the present invention rather than package count when according to actual implementation Mesh, shape and size are drawn, when actual implementation kenel, quantity and the ratio of each component can arbitrarily change for one kind, and its Assembly layout kenel may also be increasingly complex.
The purpose of the present embodiment is that a kind of acoustic convolver and its applied artificial intelligence process device are provided, for solving Convolutional neural networks are to realize that bring processing speed is slack-off by software operation in the prior art, to processor performance requirement High problem.A kind of acoustic convolver of the present embodiment described in detail below and its principle of applied artificial intelligence process device And embodiment, so that those skilled in the art is not needed a kind of acoustic convolver and its institute that creative work is appreciated that the present embodiment The artificial intelligence process device of application.
Specifically, as shown in Figure 1, the present embodiment provides a kind of acoustic convolver 100, the acoustic convolver 100 is electrically connected to outer Portion's memory, wherein the external memory is stored with pending data and weight parameter;The acoustic convolver 100 includes: parameter Buffer 110, input buffer, convolution algorithm circuit 150 and output state 160.
First pending data includes multiple channel datas;First weight parameter includes multilayer subparameter, often Straton parameter corresponds each channel data respectively;The convolution algorithm circuit 150 have it is multiple, for parallel correspondingly Calculate the convolution algorithm result of each channel data.
In this present embodiment, the parameter register 110 (Con_reg shown in Fig. 2) is described for receiving and exporting Weight parameter (Weight shown in Fig. 2).The parameter register 110 includes a FIFO memory, and the weight parameter is deposited It is stored in the FIFO memory.Wherein, the parameter in input buffer, convolution algorithm circuit 150 and output state 160 is equal It is also stored in the parameter register 110 after configuration is good.
In this present embodiment, the input buffer includes: multiple connected row buffers, described for receiving and exporting Pending data;Wherein, the every output a data of each row buffer is then gathered to form column data output.
The input buffer includes the first row buffer 120 (Conv_in_cache shown in Fig. 2), and the second row is slow Storage 130 (Conv_in_buffer shown in Fig. 2) and (Con_in_ shown in Fig. 2 of matrix buffer 140 matrix).First row buffer 120 (Conv_in_cache shown in Fig. 2), the second row buffer 130 are (shown in Fig. 2 Conv_in_buffer) and matrix buffer 140 (Con_in_matrix shown in Fig. 2) is used for 1*1 pixel data Input carries out processing output Pv*K2Pixel data.Wherein, Pv is row vector, and K is the size of convolution kernel.Below to the input Buffer is described in detail.
Specifically, in this present embodiment, first row buffer 120 (Conv_in_cache shown in Fig. 2) is by turn The pixel data for receiving characteristic spectrum to be processed through exporting row pixel data simultaneously after wave filter, and stores input The characteristic spectrum of each convolutional layer;Wherein, the data amount check of the every row of row pixel is parallel filter quantity.
In this present embodiment, first row buffer 120 includes a BRAM, and the characteristic spectrum of each convolutional layer inputs picture Prime number is stored according to by buffered BRAM with improving the localization of pixel data.
Wherein, in this present embodiment, first row buffer 120 is sequentially output the row pixel number of each convolutional layer According to, and it is sequentially output when exporting each described convolutional layer row pixel data the row pixel data of each channel data.It is i.e. described First row buffer 120 exports the pixel data in first channel when starting, when the pixel data to first channel is handled After the completion, first row buffer 120 starts to export the pixel data in second channel, when all channels of a convolutional layer Pixel data all export after, carry out the channel of next convolutional layer pixel data output.Wherein, first row buffer 120 can be iterated calculating output using different filters from first convolutional layer to a last convolutional layer.
In this present embodiment, the input buffer further include: at least one second row buffer 130, as shown in figure 3, Second row buffer 130 includes multiple concatenated FIFO memories, each described FIFO memory stores characteristic spectrum One-row pixels data;Wherein, each row pixel data is successively stored along the path that concatenated FIFO memory is formed to each The FIFO memory;Second row buffer 130 receives Pf row pixel data, exports Pf × K matrix form pixel Data;Wherein, Pf is parallel filter quantity, and K is the size of convolution kernel.
The first row pixel data is stored in first FIFO, and first FIFO starts to receive the second row pixel data, and The first row pixel data is output in second FIFO.In this way, two FIFO will store two continuous row pixel datas simultaneously They are exported.
In this present embodiment, the input buffer further include: at least one matrix buffer 140, each described square Battle array buffer 140 by being arranged in matrix for storing multiple registers of pixel data, the size of the register be Pf × K × 2, as shown in figure 4, register when being shown as K=3.The matrix buffer 140 inputs Pf × K matrix form pixel number According to when the columns of the pixel data of input is greater than K, the matrix buffer 140 exports Pf × K × K matrix form pixel Data.
In this present embodiment, the convolution algorithm circuit 150 is used to receive the number to be processed from the input buffer According to, from the parameter register 110 receive weight parameter, carry out convolution algorithm and exporting convolution algorithm result accordingly.
Specifically, in this present embodiment, the convolution algorithm circuit 150 includes: multiple convolution kernels run parallel, each A convolution kernel includes the multiplier for carrying out convolution algorithm;Adder tree, to the output result of multiple multipliers It adds up;Each described acoustic convolver 100 inputs K × K matrix form pixel data, according to the pixel data of input and institute It states weight parameter and passes through convolution algorithm output pixel data by turn.
For example, image has tri- channel datas of R, G, B, i.e. three two-dimensional matrixes, it is assumed that the first weight parameter is The depth of filter is 3, that is, has three straton weight parameters, i.e. three two-dimensional matrixes, each length and width are set as K*K, it is assumed that K is odd Number 3, respectively with three Chanel convolution algorithms, when from the first pending data take out Pv*k*3 a data cube (Pv > K), it is assumed that Pv is 5, then the filter will with the data cube by convolution algorithm circuit 150 three times could operation finish, and Preferably, convolution algorithm circuit 150 can be equipped with 3 of corresponding number, carry out so as to parallel within a clock cycle The convolution algorithm of respective be responsible for Channel.
In this present embodiment, the output state 160 is for receiving the convolution algorithm result and by the convolution algorithm As a result it is exported to the external memory.
Specifically, the output state 160 receives the convolution algorithm in each channel as a result, all channels of then adding up As a result the convolution algorithm of data is as a result, be temporarily stored in the output state 160.
Specifically, in this present embodiment, as shown in figure 5, the output state 160 includes: that two parallel FIFO are deposited Reservoir is stored in the same FIFO memory after the channel data of the same filter is accumulated;Data selector (MUX), for each accumulated result to be back to the adder tree until the adder exports final accumulation result.
Wherein, the quantity of the adder is equal to Pf*Pv, in addition, data selector (MUX) is also used to speed data stream It is down to 1*1, a pixel pixel output.
In this present embodiment, the acoustic convolver 100 further include: pond computing circuit 170 is connected to the output state Between 160 and the external memory, exported for carrying out Chi Huahou to the convolution algorithm result to external memory.
The pond computing circuit 170 provides maximum pond, the pond computing circuit 170 for every two rows pixel data Comprising a FIFO memory, for storing number of pels per line evidence.
Specifically, pond mode can be Max pooling, it is also possible to Average pooling, it can be by patrolling Circuit is collected to realize.
In this present embodiment, between each internal part included by the acoustic convolver 100 and the acoustic convolver 100 with It is connected between the external memory by first in, first out data-interface (multiple SIF shown in Fig. 2).
Specifically, the first in, first out data-interface includes: pushup storage, the first logic unit and the second logic Unit.
Wherein, the pushup storage includes: writeable enable pin, data in pin and the memory of uplink Full status indicator pin;And readable enable pin, data out pin and the memory dummy status identification pine of downlink;
The first logic unit connection uplink object, the writeable enable pin and memory expire status indicator pin, For when receiving the write request of uplink object, the signal on status indicator pin is expired according to memory determines and described first enter elder generation Whether memory has expired out;Enable pushup storage writeable to writeable enable pin if non-full, sending enable signal;It is no Then, enable the pushup storage not writeable.
Specifically, first logic unit includes: the first reverser, and input terminal, which connects the memory, expires state mark Know pin, output end draws the first identifier end for connecting uplink object;First and door, first input end connection described the One data effectively identify end, and the second input terminal is connected to the upstream data live end for connecting uplink object, and output end connects Connect the writeable enable pin.
The second logic unit connection downlink object, the readable enable pin and memory dummy status identification pine, For when receiving the read request of downlink object, is determined according to the signal on memory dummy status identification pine and described first enter elder generation Whether memory is empty out;If not empty, enable signal is sent to readable enable pin to enable pushup storage readable;It is no Then, enable the pushup storage unreadable.
Specifically, second logic unit includes: the second reverser, and input terminal connects the memory dummy status mark Know pin, output end draws the downlink data live end for connecting downlink object;Second and door, first input end connects institute Downlink data live end is stated, the second input terminal, which is connected to, effectively identifies end for the downlink data for connecting downlink object.
In the present embodiment, the operational process of acoustic convolver 100 is as follows:
Pending data is read from external memory by first in, first out data-interface (multiple SIF shown in Fig. 2), and It stores to the BRAM of the first row buffer 120 (Conv_in_cache shown in Fig. 2).
The weight parameter of k*k is read from external memory by first in, first out data-interface (multiple SIF shown in Fig. 2) (channel) is then stored to the parameter register 110.
Once the parameter register 110 is loaded into a weight parameter, start the pixel number for receiving processing characteristic spectrum According to by the first row buffer 120 (Conv_in_cache shown in Fig. 2), the second row buffer 130 is (shown in Fig. 2 Conv_in_buffer) and the processing of matrix buffer 140 (Con_in_matrix shown in Fig. 2), the convolution algorithm The per clock cycle of circuit 150 receives Pv*K2Pixel data.
By the convolution algorithm circuit 150 to each channel (a height of H of the characteristic spectrum of each channel input and Width is W) input data carry out that convolution is cumulative, then export the result in each channel to the output state 160.
The different input channel of cyclic access, the add up data result in each channel of the output state 160 are known Getting with filter is the corresponding characteristic spectrum of (H-K+1) * (W-K+1) size.
Then it can use reception (H-K+1) * (W-K+1) pixel data of pond computing circuit 170 to do after pondization is handled again Characteristic spectrum is exported, directly can also export characteristic spectrum from the output state 160.
When the pond computing circuit 170 or the output state 160 export the feature Jing Guo a filter process After map, the parameter register 110 is re-loaded to a weight parameter, above-mentioned by different filter iterations Processes pixel process, until completing the processes pixel of all convolutional layers.
The present embodiment also provides a kind of artificial intelligence process device, and the artificial intelligence process device includes as described above Acoustic convolver 100.Above-mentioned that the acoustic convolver 100 is described in detail, details are not described herein.
Wherein, the artificial intelligence process device, comprising: programmable logic circuit (PL) and processing system circuit (PS).Institute Stating processing system circuit includes central processing unit, can be realized by MCU, SoC, FPGA or DSP etc., such as ARM framework is embedding Enter formula processor chips etc.;The central processing unit and external memory communicate to connect, and the external memory 200 is, for example, RAM or ROM memory, such as three generations, four generation DDR SDRAM etc.;The central processing unit can be to external memory read/write data.
In conclusion acoustic convolver of the invention is cached by parameter register, input buffer, convolution algorithm circuit, output Device, the hardware such as pond computing circuit and first in, first out data-interface composition, can the high convolutional Neural net of high speed processing complexity Network algorithm can effectively solve to realize that bring processing speed is slack-off by software operation in the prior art, to processor performance Demanding problem.So the present invention effectively overcomes various shortcoming in the prior art and has high industrial utilization value.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe The personage for knowing this technology all without departing from the spirit and scope of the present invention, carries out modifications and changes to above-described embodiment.Cause This, institute is complete without departing from the spirit and technical ideas disclosed in the present invention by those of ordinary skill in the art such as At all equivalent modifications or change, should be covered by the claims of the present invention.

Claims (10)

1. a kind of acoustic convolver, is electrically connected to external memory, wherein the external memory is stored with pending data and power Weight parameter;It is characterized in that, the acoustic convolver includes: parameter register, input buffer, convolution algorithm circuit and output caching Device;
The parameter register is for receiving and exporting the weight parameter;
The input buffer includes: multiple connected row buffers, for receiving and exporting the pending data;Wherein, Each every output a data of the row buffer is then gathered to form column data output;
The convolution algorithm circuit is used to receive the pending data from the input buffer, connect from the parameter register Weight parameter is received, carry out convolution algorithm accordingly and exports convolution algorithm result;
The output state is for receiving the convolution algorithm result and by the convolution algorithm result to the external memory Output.
2. acoustic convolver according to claim 1, which is characterized in that the input buffer includes:
First row buffer receives the pixel data of characteristic spectrum to be processed by turn, through exporting row after wave filter simultaneously Pixel data, and store the characteristic spectrum of each convolutional layer of input;Wherein, the data amount check of the every row of row pixel is parallel Filter quantity.
3. acoustic convolver according to claim 2, which is characterized in that first row buffer is sequentially output each convolution The row pixel data of layer, and it is sequentially output when exporting each described convolutional layer row pixel data the row pixel of each channel data Data.
4. acoustic convolver according to claim 2, which is characterized in that the input buffer further include:
At least one second row buffer includes multiple concatenated FIFO memories, each described FIFO memory storage is special Levy the one-row pixels data of map;Wherein, each row pixel data is successively deposited along the path that concatenated FIFO memory is formed It stores up to each FIFO memory;Second row buffer exports Pf × K matrix form pixel data;Wherein, Pf is simultaneously Capable filter quantity, K are the size of convolution kernel.
5. acoustic convolver according to claim 4, which is characterized in that the input buffer further include:
At least one matrix buffer, each described matrix buffer is by being arranged in matrix for storing the multiple of pixel data Register, the size of the register are Pf × K × 2, when the columns of the pixel data of input is greater than K, the matrix caching Device exports Pf × K × K matrix form pixel data.
6. acoustic convolver according to claim 5, which is characterized in that the convolution algorithm circuit includes:
Multiple convolution kernels run parallel, each described convolution kernel include the multiplier for carrying out convolution algorithm;
Adder tree adds up to the output result of multiple multipliers;
Each described acoustic convolver inputs K × K matrix form pixel data, is joined according to the pixel data of input and the weight Number passes through convolution algorithm output pixel data by turn.
7. acoustic convolver according to claim 6, which is characterized in that the output state includes:
Two parallel FIFO memories, are stored in the same FIFO after the channel data of the same filter is accumulated In memory;
Data selector, for each accumulated result to be back to the adder tree until the adder exports finally Accumulation result.
8. acoustic convolver according to claim 1, which is characterized in that the acoustic convolver further include:
Pond computing circuit is connected between the output state and the external memory, for the convolution algorithm As a result Chi Huahou is carried out to export to external memory.
9. acoustic convolver according to claim 1, which is characterized in that between each internal part included by the acoustic convolver, And it is connected between the acoustic convolver and the external memory by first in, first out data-interface.
10. a kind of artificial intelligence process device, which is characterized in that the artificial intelligence process device include as claim 1 to Acoustic convolver described in claim 9 any claim.
CN201880002156.XA 2018-01-15 2018-01-15 Acoustic convolver and its applied artificial intelligence process device Pending CN109416756A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/072678 WO2019136764A1 (en) 2018-01-15 2018-01-15 Convolutor and artificial intelligent processing device applied thereto

Publications (1)

Publication Number Publication Date
CN109416756A true CN109416756A (en) 2019-03-01

Family

ID=65462114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880002156.XA Pending CN109416756A (en) 2018-01-15 2018-01-15 Acoustic convolver and its applied artificial intelligence process device

Country Status (2)

Country Link
CN (1) CN109416756A (en)
WO (1) WO2019136764A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978161A (en) * 2019-03-08 2019-07-05 吉林大学 A kind of general convolution-pond synchronization process convolution kernel system
CN109992225A (en) * 2019-04-04 2019-07-09 北京中科寒武纪科技有限公司 Data output method and relevant apparatus
CN110866597A (en) * 2019-09-27 2020-03-06 珠海博雅科技有限公司 Data processing circuit and data processing method
CN111814675A (en) * 2020-07-08 2020-10-23 上海雪湖科技有限公司 Convolutional neural network characteristic diagram assembling system based on FPGA supporting dynamic resolution
CN112101178A (en) * 2020-09-10 2020-12-18 电子科技大学 Intelligent SOC terminal assisting blind people in perceiving external environment
WO2021072732A1 (en) * 2019-10-18 2021-04-22 北京希姆计算科技有限公司 Matrix computing circuit, apparatus and method
CN112784973A (en) * 2019-11-04 2021-05-11 北京希姆计算科技有限公司 Convolution operation circuit, device and method
WO2022095632A1 (en) * 2020-11-06 2022-05-12 苏州浪潮智能科技有限公司 Method and apparatus for implementing data convolution operation on basis of fpga, and medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727633A (en) * 2019-09-17 2020-01-24 广东高云半导体科技股份有限公司 Edge artificial intelligence computing system framework based on SoC FPGA
CN111047010A (en) * 2019-11-25 2020-04-21 天津大学 Method and device for reducing first-layer convolution calculation delay of CNN accelerator
TWI766568B (en) * 2020-04-17 2022-06-01 神盾股份有限公司 Processing device for executing convolution neural network computation and operation method thereof

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
US20160379109A1 (en) * 2015-06-29 2016-12-29 Microsoft Technology Licensing, Llc Convolutional neural networks on hardware accelerators
EP3153996A2 (en) * 2015-10-07 2017-04-12 Altera Corporation Method and apparatus for implementing layers on a convolutional neural network accelerator
CN106970896A (en) * 2017-03-30 2017-07-21 中国人民解放军国防科学技术大学 The vectorization implementation method of the two-dimensional matrix convolution of vector processor-oriented
US20170236053A1 (en) * 2015-12-29 2017-08-17 Synopsys, Inc. Configurable and Programmable Multi-Core Architecture with a Specialized Instruction Set for Embedded Application Based on Neural Networks
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107392309A (en) * 2017-09-11 2017-11-24 东南大学—无锡集成电路技术研究所 A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA
CN107454966A (en) * 2015-05-21 2017-12-08 谷歌公司 Weight is prefetched for neural network processor

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017129325A1 (en) * 2016-01-29 2017-08-03 Fotonation Limited A convolutional neural network
US10497089B2 (en) * 2016-01-29 2019-12-03 Fotonation Limited Convolutional neural network
GB201607713D0 (en) * 2016-05-03 2016-06-15 Imagination Tech Ltd Convolutional neural network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107454966A (en) * 2015-05-21 2017-12-08 谷歌公司 Weight is prefetched for neural network processor
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
US20160379109A1 (en) * 2015-06-29 2016-12-29 Microsoft Technology Licensing, Llc Convolutional neural networks on hardware accelerators
WO2017003887A1 (en) * 2015-06-29 2017-01-05 Microsoft Technology Licensing, Llc Convolutional neural networks on hardware accelerators
EP3153996A2 (en) * 2015-10-07 2017-04-12 Altera Corporation Method and apparatus for implementing layers on a convolutional neural network accelerator
US20170236053A1 (en) * 2015-12-29 2017-08-17 Synopsys, Inc. Configurable and Programmable Multi-Core Architecture with a Specialized Instruction Set for Embedded Application Based on Neural Networks
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN106970896A (en) * 2017-03-30 2017-07-21 中国人民解放军国防科学技术大学 The vectorization implementation method of the two-dimensional matrix convolution of vector processor-oriented
CN107392309A (en) * 2017-09-11 2017-11-24 东南大学—无锡集成电路技术研究所 A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HONGXIANG FAN等: "F-C3D: FPGA-based 3-Dimensional Convolutional Neural Network", 《2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL)》 *
LI DU等: "A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS》 *
陆志坚: "基于FPGA的卷积神经网络并行结构研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978161A (en) * 2019-03-08 2019-07-05 吉林大学 A kind of general convolution-pond synchronization process convolution kernel system
CN109978161B (en) * 2019-03-08 2022-03-04 吉林大学 Universal convolution-pooling synchronous processing convolution kernel system
CN109992225A (en) * 2019-04-04 2019-07-09 北京中科寒武纪科技有限公司 Data output method and relevant apparatus
CN109992225B (en) * 2019-04-04 2022-02-22 中科寒武纪科技股份有限公司 Data output method and related device
CN110866597A (en) * 2019-09-27 2020-03-06 珠海博雅科技有限公司 Data processing circuit and data processing method
WO2021072732A1 (en) * 2019-10-18 2021-04-22 北京希姆计算科技有限公司 Matrix computing circuit, apparatus and method
CN112784973A (en) * 2019-11-04 2021-05-11 北京希姆计算科技有限公司 Convolution operation circuit, device and method
CN111814675A (en) * 2020-07-08 2020-10-23 上海雪湖科技有限公司 Convolutional neural network characteristic diagram assembling system based on FPGA supporting dynamic resolution
CN111814675B (en) * 2020-07-08 2023-09-29 上海雪湖科技有限公司 Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA
CN112101178A (en) * 2020-09-10 2020-12-18 电子科技大学 Intelligent SOC terminal assisting blind people in perceiving external environment
CN112101178B (en) * 2020-09-10 2023-03-24 电子科技大学 Intelligent SOC terminal assisting blind people in perceiving external environment
WO2022095632A1 (en) * 2020-11-06 2022-05-12 苏州浪潮智能科技有限公司 Method and apparatus for implementing data convolution operation on basis of fpga, and medium

Also Published As

Publication number Publication date
WO2019136764A1 (en) 2019-07-18

Similar Documents

Publication Publication Date Title
CN109416756A (en) Acoustic convolver and its applied artificial intelligence process device
US11720523B2 (en) Performing concurrent operations in a processing element
CN110458279B (en) FPGA-based binary neural network acceleration method and system
JP6960700B2 (en) Multicast Network On-Chip Convolutional Neural Network Hardware Accelerator and Its Behavior
CN107704923A (en) Convolutional neural networks computing circuit
CN107341544A (en) A kind of reconfigurable accelerator and its implementation based on divisible array
CN108733348B (en) Fused vector multiplier and method for performing operation using the same
CN108108809A (en) A kind of hardware structure and its method of work that acceleration is made inferences for convolutional Neural metanetwork
CN106022468A (en) Artificial neural network processor integrated circuit and design method therefor
CN110348574A (en) A kind of general convolutional neural networks accelerating structure and design method based on ZYNQ
Dundar et al. Memory access optimized routing scheme for deep networks on a mobile coprocessor
CN109496319A (en) Artificial intelligence process device hardware optimization method, system, storage medium, terminal
CN109416755A (en) Artificial intelligence method for parallel processing, device, readable storage medium storing program for executing and terminal
CN111210019A (en) Neural network inference method based on software and hardware cooperative acceleration
WO2023123919A1 (en) Data processing circuit, data processing method, and related product
CN109740619B (en) Neural network terminal operation method and device for target recognition
Chang et al. VSCNN: Convolution neural network accelerator with vector sparsity
CN108256640A (en) Convolutional neural networks implementation method
CN110059809A (en) A kind of computing device and Related product
CN110178146A (en) Deconvolution device and its applied artificial intelligence process device
CN214586992U (en) Neural network accelerating circuit, image processor and three-dimensional imaging electronic equipment
CN108256637A (en) A kind of cellular array three-dimensional communication transmission method
CN109740729A (en) Operation method, device and Related product
CN109416743A (en) A kind of Three dimensional convolution device artificially acted for identification
CN109542513A (en) A kind of convolutional neural networks instruction data storage system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190301