CN109416756A

CN109416756A - Acoustic convolver and its applied artificial intelligence process device

Info

Publication number: CN109416756A
Application number: CN201880002156.XA
Authority: CN
Inventors: 肖梦秋
Original assignee: Shenzhen Corerain Technologies Co Ltd
Current assignee: Shenzhen Corerain Technologies Co Ltd
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2019-03-01
Also published as: WO2019136764A1

Abstract

A kind of acoustic convolver (100) and its applied artificial intelligence process device, are electrically connected to external memory, and the external memory is stored with pending data and weight parameter；Acoustic convolver (100) includes: parameter register (110), input buffer, convolution algorithm circuit (150) and output state (160)；Parameter register (110) is for receiving and exporting the weight parameter；Input buffer includes: multiple connected row buffers, for receiving and exporting pending data；Wherein, the every output a data of each row buffer is then gathered to form column data output；Convolution algorithm circuit (150) is used to receive pending data from the input buffer, receives weight parameter from parameter register (110), carries out convolution algorithm accordingly and exports convolution algorithm result；Output state (160) is for receiving convolution operation result and exporting the convolution algorithm result to external memory.This method can solve to realize that bring processing speed is slack-off by software operation in the prior art, to the demanding problem of processor performance.

Description

Acoustic convolver and its applied artificial intelligence process device

Technical field

The present invention relates to processor technical fields, more particularly to artificial intelligence process device technical field, specially convolution Device and its applied artificial intelligence process device.

Background technique

Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feedforward neural networks, it Artificial neuron can respond the surrounding cells in a part of coverage area, have outstanding performance for large-scale image procossing.Convolution Neural network includes convolutional layer (convolutional layer)) and pond layer (pooling layer).

Now, CNN has become one of the research hotspot of numerous scientific domains, especially in pattern classification field, due to The network avoids the pretreatment complicated early period to image, can directly input original image, thus has obtained more extensive Using.

Generally, the basic structure of CNN includes two layers, and one is characterized extract layer, the input of each neuron with it is previous The local acceptance region of layer is connected, and extracts the feature of the part.After the local feature is extracted, it is between other feature Positional relationship is also decided therewith；The second is Feature Mapping layer, each computation layer of network is made of multiple Feature Mappings, often A Feature Mapping is a plane, and the weight of all neurons is equal in plane.Feature Mapping structure is small using influence function core Activation primitive of the sigmoid function as convolutional network so that Feature Mapping has shift invariant.Further, since one Neuron on mapping face shares weight, thus reduces the number of network freedom parameter.Each of convolutional neural networks Convolutional layer all followed by one is used to ask the computation layer of local average and second extraction, this distinctive feature extraction structure twice Reduce feature resolution.

CNN is mainly used to the X-Y scheme of identification displacement, scaling and other forms distortion invariance.Due to the feature of CNN Detection layers are learnt by training data, so the feature extraction of display is avoided when using CNN, and implicitly from instruction Practice and is learnt in data；Furthermore since the neuron weight on same Feature Mapping face is identical, so network can be learned parallel It practises, this is also convolutional network is connected with each other a big advantage of network relative to neuron.Convolutional neural networks are with its local weight Shared special construction has unique superiority in terms of speech recognition and image procossing, is laid out closer to actual life Object neural network, the shared complexity for reducing network of weight, the especially image of multidimensional input vector can directly input net This feature of network avoids the complexity of data reconstruction in feature extraction and assorting process.

Currently, convolutional neural networks are carried out by the software operated in a processor or multiple distributed treatments Operation realizes that, as the complexity of convolutional neural networks increases, processing speed is opposite to be slowed down, and to the performance of processor It is required that also higher and higher.

Summary of the invention

In view of the foregoing deficiencies of prior art, the purpose of the present invention is to provide acoustic convolvers and its applied artificial Intelligent treatment device is to realize bring processing speed by software operation for solving convolutional neural networks in the prior art It is slack-off, to the demanding problem of processor performance.

In order to achieve the above objects and other related objects, the present invention provides a kind of acoustic convolver, is electrically connected to external storage Device, wherein the external memory is stored with pending data and weight parameter；The acoustic convolver includes: parameter register, defeated Enter buffer, convolution algorithm circuit and output state；The parameter register is for receiving and exporting the weight parameter；Institute Stating input buffer includes: multiple connected row buffers, for receiving and exporting the pending data；Wherein, each described The every output a data of row buffer is then gathered to form column data output；The convolution algorithm circuit is used for slow from the input Storage receives the pending data, receives weight parameter from the parameter register, carries out convolution algorithm accordingly and exports volume Product operation result；The output state is for receiving the convolution algorithm result and by the convolution algorithm result to the outside Memory output.

In one embodiment of the invention, the input buffer includes: the first row buffer, is received by turn to be processed The pixel data of characteristic spectrum through exporting row pixel data simultaneously after wave filter, and stores the institute of each convolutional layer of input State characteristic spectrum；Wherein, the data amount check of the every row of row pixel is parallel filter quantity.

In one embodiment of the invention, first row buffer is sequentially output the row pixel number of each convolutional layer According to, and it is sequentially output when exporting each described convolutional layer row pixel data the row pixel data of each channel data.

In one embodiment of the invention, the input buffer further include: at least one second row buffer, comprising more A concatenated FIFO memory, the one-row pixels data of each FIFO memory storage characteristic spectrum；Wherein, each described Row pixel data is successively stored along the path that concatenated FIFO memory is formed to each FIFO memory；Second row is slow Storage exports Pf × K matrix form pixel data；Wherein, Pf is parallel filter quantity, and K is the size of convolution kernel.

In one embodiment of the invention, the input buffer further include: at least one matrix buffer, each institute It is Pf × K that matrix buffer, which is stated, by being arranged in matrix for storing multiple registers of pixel data, the size of the register × 2, when the columns of the pixel data of input is greater than K, the matrix buffer exports Pf × K × K matrix form pixel number According to.

In one embodiment of the invention, the convolution algorithm circuit includes: multiple convolution kernels run parallel, each The convolution kernel includes the multiplier for carrying out convolution algorithm；Adder tree, to the output results of multiple multipliers into Row is cumulative；Each described acoustic convolver inputs K × K matrix form pixel data, according to the pixel data of input and the power Weight parameter passes through convolution algorithm output pixel data by turn.

In one embodiment of the invention, the output state includes: two parallel FIFO memories, and process is same After the channel data of a filter is accumulated in the same FIFO memory of deposit；Data selector, for that will tire out every time The result added is back to the adder tree until the adder exports final accumulation result.

In one embodiment of the invention, the acoustic convolver further include: pond computing circuit is connected to the output caching Between device and the external memory, exported for carrying out Chi Huahou to the convolution algorithm result to external memory.

In one embodiment of the invention, between each internal part included by the acoustic convolver and the acoustic convolver It is connect between the external memory by first in, first out data-interface.

The present invention also provides a kind of artificial intelligence process device, the artificial intelligence process device includes volume as described above Product device.

As described above, acoustic convolver and its applied artificial intelligence process device of the invention, have the advantages that

Acoustic convolver of the invention is by parameter register, input buffer, convolution algorithm circuit, output state, Chi Huayun Calculate the hardware such as circuit and first in, first out data-interface composition, can the high convolutional neural networks algorithm of high speed processing complexity, can It is demanding to processor performance to ask effectively to solve to realize that bring processing speed is slack-off by software operation in the prior art Topic.

Detailed description of the invention

Fig. 1 is shown as a kind of whole schematic illustration of acoustic convolver in the prior art.

Fig. 2 is shown as a kind of input and output schematic diagram of acoustic convolver of the invention.

Fig. 3 is shown as the schematic diagram of the second row buffer in a kind of acoustic convolver of the invention.

Fig. 4 is shown as the input and output schematic diagram of matrix buffer in a kind of acoustic convolver of the invention.

Fig. 5 is shown as the structural schematic diagram of output state in a kind of acoustic convolver of the invention.

Component label instructions

100 acoustic convolvers

110 parameter registers

120 first row buffers

130 second row buffers

140 matrix buffers

150 convolution algorithm circuits

160 output states

170 pond computing circuits

Specific embodiment

Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification Other advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realities The mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing from Various modifications or alterations are carried out under spirit of the invention.It should be noted that in the absence of conflict, following embodiment and implementation Feature in example can be combined with each other.

It should be noted that as shown in Figures 1 to 5, diagram provided in following embodiment only illustrates in a schematic way Basic conception of the invention, only shown in schema then with related component in the present invention rather than package count when according to actual implementation Mesh, shape and size are drawn, when actual implementation kenel, quantity and the ratio of each component can arbitrarily change for one kind, and its Assembly layout kenel may also be increasingly complex.

The purpose of the present embodiment is that a kind of acoustic convolver and its applied artificial intelligence process device are provided, for solving Convolutional neural networks are to realize that bring processing speed is slack-off by software operation in the prior art, to processor performance requirement High problem.A kind of acoustic convolver of the present embodiment described in detail below and its principle of applied artificial intelligence process device And embodiment, so that those skilled in the art is not needed a kind of acoustic convolver and its institute that creative work is appreciated that the present embodiment The artificial intelligence process device of application.

Specifically, as shown in Figure 1, the present embodiment provides a kind of acoustic convolver 100, the acoustic convolver 100 is electrically connected to outer Portion's memory, wherein the external memory is stored with pending data and weight parameter；The acoustic convolver 100 includes: parameter Buffer 110, input buffer, convolution algorithm circuit 150 and output state 160.

First pending data includes multiple channel datas；First weight parameter includes multilayer subparameter, often Straton parameter corresponds each channel data respectively；The convolution algorithm circuit 150 have it is multiple, for parallel correspondingly Calculate the convolution algorithm result of each channel data.

In this present embodiment, the parameter register 110 (Con_reg shown in Fig. 2) is described for receiving and exporting Weight parameter (Weight shown in Fig. 2).The parameter register 110 includes a FIFO memory, and the weight parameter is deposited It is stored in the FIFO memory.Wherein, the parameter in input buffer, convolution algorithm circuit 150 and output state 160 is equal It is also stored in the parameter register 110 after configuration is good.

In this present embodiment, the input buffer includes: multiple connected row buffers, described for receiving and exporting Pending data；Wherein, the every output a data of each row buffer is then gathered to form column data output.

The input buffer includes the first row buffer 120 (Conv_in_cache shown in Fig. 2), and the second row is slow Storage 130 (Conv_in_buffer shown in Fig. 2) and (Con_in_ shown in Fig. 2 of matrix buffer 140 matrix).First row buffer 120 (Conv_in_cache shown in Fig. 2), the second row buffer 130 are (shown in Fig. 2 Conv_in_buffer) and matrix buffer 140 (Con_in_matrix shown in Fig. 2) is used for 1*1 pixel data Input carries out processing output Pv*K²Pixel data.Wherein, Pv is row vector, and K is the size of convolution kernel.Below to the input Buffer is described in detail.

Specifically, in this present embodiment, first row buffer 120 (Conv_in_cache shown in Fig. 2) is by turn The pixel data for receiving characteristic spectrum to be processed through exporting row pixel data simultaneously after wave filter, and stores input The characteristic spectrum of each convolutional layer；Wherein, the data amount check of the every row of row pixel is parallel filter quantity.

In this present embodiment, first row buffer 120 includes a BRAM, and the characteristic spectrum of each convolutional layer inputs picture Prime number is stored according to by buffered BRAM with improving the localization of pixel data.

Wherein, in this present embodiment, first row buffer 120 is sequentially output the row pixel number of each convolutional layer According to, and it is sequentially output when exporting each described convolutional layer row pixel data the row pixel data of each channel data.It is i.e. described First row buffer 120 exports the pixel data in first channel when starting, when the pixel data to first channel is handled After the completion, first row buffer 120 starts to export the pixel data in second channel, when all channels of a convolutional layer Pixel data all export after, carry out the channel of next convolutional layer pixel data output.Wherein, first row buffer 120 can be iterated calculating output using different filters from first convolutional layer to a last convolutional layer.

In this present embodiment, the input buffer further include: at least one second row buffer 130, as shown in figure 3, Second row buffer 130 includes multiple concatenated FIFO memories, each described FIFO memory stores characteristic spectrum One-row pixels data；Wherein, each row pixel data is successively stored along the path that concatenated FIFO memory is formed to each The FIFO memory；Second row buffer 130 receives Pf row pixel data, exports Pf × K matrix form pixel Data；Wherein, Pf is parallel filter quantity, and K is the size of convolution kernel.

The first row pixel data is stored in first FIFO, and first FIFO starts to receive the second row pixel data, and The first row pixel data is output in second FIFO.In this way, two FIFO will store two continuous row pixel datas simultaneously They are exported.

In this present embodiment, the input buffer further include: at least one matrix buffer 140, each described square Battle array buffer 140 by being arranged in matrix for storing multiple registers of pixel data, the size of the register be Pf × K × 2, as shown in figure 4, register when being shown as K=3.The matrix buffer 140 inputs Pf × K matrix form pixel number According to when the columns of the pixel data of input is greater than K, the matrix buffer 140 exports Pf × K × K matrix form pixel Data.

In this present embodiment, the convolution algorithm circuit 150 is used to receive the number to be processed from the input buffer According to, from the parameter register 110 receive weight parameter, carry out convolution algorithm and exporting convolution algorithm result accordingly.

Specifically, in this present embodiment, the convolution algorithm circuit 150 includes: multiple convolution kernels run parallel, each A convolution kernel includes the multiplier for carrying out convolution algorithm；Adder tree, to the output result of multiple multipliers It adds up；Each described acoustic convolver 100 inputs K × K matrix form pixel data, according to the pixel data of input and institute It states weight parameter and passes through convolution algorithm output pixel data by turn.

For example, image has tri- channel datas of R, G, B, i.e. three two-dimensional matrixes, it is assumed that the first weight parameter is The depth of filter is 3, that is, has three straton weight parameters, i.e. three two-dimensional matrixes, each length and width are set as K*K, it is assumed that K is odd Number 3, respectively with three Chanel convolution algorithms, when from the first pending data take out Pv*k*3 a data cube (Pv > K), it is assumed that Pv is 5, then the filter will with the data cube by convolution algorithm circuit 150 three times could operation finish, and Preferably, convolution algorithm circuit 150 can be equipped with 3 of corresponding number, carry out so as to parallel within a clock cycle The convolution algorithm of respective be responsible for Channel.

In this present embodiment, the output state 160 is for receiving the convolution algorithm result and by the convolution algorithm As a result it is exported to the external memory.

Specifically, the output state 160 receives the convolution algorithm in each channel as a result, all channels of then adding up As a result the convolution algorithm of data is as a result, be temporarily stored in the output state 160.

Specifically, in this present embodiment, as shown in figure 5, the output state 160 includes: that two parallel FIFO are deposited Reservoir is stored in the same FIFO memory after the channel data of the same filter is accumulated；Data selector (MUX), for each accumulated result to be back to the adder tree until the adder exports final accumulation result.

Wherein, the quantity of the adder is equal to Pf*Pv, in addition, data selector (MUX) is also used to speed data stream It is down to 1*1, a pixel pixel output.

In this present embodiment, the acoustic convolver 100 further include: pond computing circuit 170 is connected to the output state Between 160 and the external memory, exported for carrying out Chi Huahou to the convolution algorithm result to external memory.

The pond computing circuit 170 provides maximum pond, the pond computing circuit 170 for every two rows pixel data Comprising a FIFO memory, for storing number of pels per line evidence.

Specifically, pond mode can be Max pooling, it is also possible to Average pooling, it can be by patrolling Circuit is collected to realize.

In this present embodiment, between each internal part included by the acoustic convolver 100 and the acoustic convolver 100 with It is connected between the external memory by first in, first out data-interface (multiple SIF shown in Fig. 2).

Specifically, the first in, first out data-interface includes: pushup storage, the first logic unit and the second logic Unit.

Wherein, the pushup storage includes: writeable enable pin, data in pin and the memory of uplink Full status indicator pin；And readable enable pin, data out pin and the memory dummy status identification pine of downlink；

The first logic unit connection uplink object, the writeable enable pin and memory expire status indicator pin, For when receiving the write request of uplink object, the signal on status indicator pin is expired according to memory determines and described first enter elder generation Whether memory has expired out；Enable pushup storage writeable to writeable enable pin if non-full, sending enable signal；It is no Then, enable the pushup storage not writeable.

Specifically, first logic unit includes: the first reverser, and input terminal, which connects the memory, expires state mark Know pin, output end draws the first identifier end for connecting uplink object；First and door, first input end connection described the One data effectively identify end, and the second input terminal is connected to the upstream data live end for connecting uplink object, and output end connects Connect the writeable enable pin.

The second logic unit connection downlink object, the readable enable pin and memory dummy status identification pine, For when receiving the read request of downlink object, is determined according to the signal on memory dummy status identification pine and described first enter elder generation Whether memory is empty out；If not empty, enable signal is sent to readable enable pin to enable pushup storage readable；It is no Then, enable the pushup storage unreadable.

Specifically, second logic unit includes: the second reverser, and input terminal connects the memory dummy status mark Know pin, output end draws the downlink data live end for connecting downlink object；Second and door, first input end connects institute Downlink data live end is stated, the second input terminal, which is connected to, effectively identifies end for the downlink data for connecting downlink object.

In the present embodiment, the operational process of acoustic convolver 100 is as follows:

Pending data is read from external memory by first in, first out data-interface (multiple SIF shown in Fig. 2), and It stores to the BRAM of the first row buffer 120 (Conv_in_cache shown in Fig. 2).

The weight parameter of k*k is read from external memory by first in, first out data-interface (multiple SIF shown in Fig. 2) (channel) is then stored to the parameter register 110.

Once the parameter register 110 is loaded into a weight parameter, start the pixel number for receiving processing characteristic spectrum According to by the first row buffer 120 (Conv_in_cache shown in Fig. 2), the second row buffer 130 is (shown in Fig. 2 Conv_in_buffer) and the processing of matrix buffer 140 (Con_in_matrix shown in Fig. 2), the convolution algorithm The per clock cycle of circuit 150 receives Pv*K²Pixel data.

By the convolution algorithm circuit 150 to each channel (a height of H of the characteristic spectrum of each channel input and Width is W) input data carry out that convolution is cumulative, then export the result in each channel to the output state 160.

The different input channel of cyclic access, the add up data result in each channel of the output state 160 are known Getting with filter is the corresponding characteristic spectrum of (H-K+1) * (W-K+1) size.

Then it can use reception (H-K+1) * (W-K+1) pixel data of pond computing circuit 170 to do after pondization is handled again Characteristic spectrum is exported, directly can also export characteristic spectrum from the output state 160.

When the pond computing circuit 170 or the output state 160 export the feature Jing Guo a filter process After map, the parameter register 110 is re-loaded to a weight parameter, above-mentioned by different filter iterations Processes pixel process, until completing the processes pixel of all convolutional layers.

The present embodiment also provides a kind of artificial intelligence process device, and the artificial intelligence process device includes as described above Acoustic convolver 100.Above-mentioned that the acoustic convolver 100 is described in detail, details are not described herein.

Wherein, the artificial intelligence process device, comprising: programmable logic circuit (PL) and processing system circuit (PS).Institute Stating processing system circuit includes central processing unit, can be realized by MCU, SoC, FPGA or DSP etc., such as ARM framework is embedding Enter formula processor chips etc.；The central processing unit and external memory communicate to connect, and the external memory 200 is, for example, RAM or ROM memory, such as three generations, four generation DDR SDRAM etc.；The central processing unit can be to external memory read/write data.

In conclusion acoustic convolver of the invention is cached by parameter register, input buffer, convolution algorithm circuit, output Device, the hardware such as pond computing circuit and first in, first out data-interface composition, can the high convolutional Neural net of high speed processing complexity Network algorithm can effectively solve to realize that bring processing speed is slack-off by software operation in the prior art, to processor performance Demanding problem.So the present invention effectively overcomes various shortcoming in the prior art and has high industrial utilization value.

The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe The personage for knowing this technology all without departing from the spirit and scope of the present invention, carries out modifications and changes to above-described embodiment.Cause This, institute is complete without departing from the spirit and technical ideas disclosed in the present invention by those of ordinary skill in the art such as At all equivalent modifications or change, should be covered by the claims of the present invention.

Claims

1. a kind of acoustic convolver, is electrically connected to external memory, wherein the external memory is stored with pending data and power Weight parameter；It is characterized in that, the acoustic convolver includes: parameter register, input buffer, convolution algorithm circuit and output caching Device；

The parameter register is for receiving and exporting the weight parameter；

The input buffer includes: multiple connected row buffers, for receiving and exporting the pending data；Wherein, Each every output a data of the row buffer is then gathered to form column data output；

The convolution algorithm circuit is used to receive the pending data from the input buffer, connect from the parameter register Weight parameter is received, carry out convolution algorithm accordingly and exports convolution algorithm result；

The output state is for receiving the convolution algorithm result and by the convolution algorithm result to the external memory Output.

2. acoustic convolver according to claim 1, which is characterized in that the input buffer includes:

First row buffer receives the pixel data of characteristic spectrum to be processed by turn, through exporting row after wave filter simultaneously Pixel data, and store the characteristic spectrum of each convolutional layer of input；Wherein, the data amount check of the every row of row pixel is parallel Filter quantity.

3. acoustic convolver according to claim 2, which is characterized in that first row buffer is sequentially output each convolution The row pixel data of layer, and it is sequentially output when exporting each described convolutional layer row pixel data the row pixel of each channel data Data.

4. acoustic convolver according to claim 2, which is characterized in that the input buffer further include:

At least one second row buffer includes multiple concatenated FIFO memories, each described FIFO memory storage is special Levy the one-row pixels data of map；Wherein, each row pixel data is successively deposited along the path that concatenated FIFO memory is formed It stores up to each FIFO memory；Second row buffer exports Pf × K matrix form pixel data；Wherein, Pf is simultaneously Capable filter quantity, K are the size of convolution kernel.

5. acoustic convolver according to claim 4, which is characterized in that the input buffer further include:

At least one matrix buffer, each described matrix buffer is by being arranged in matrix for storing the multiple of pixel data Register, the size of the register are Pf × K × 2, when the columns of the pixel data of input is greater than K, the matrix caching Device exports Pf × K × K matrix form pixel data.

6. acoustic convolver according to claim 5, which is characterized in that the convolution algorithm circuit includes:

Multiple convolution kernels run parallel, each described convolution kernel include the multiplier for carrying out convolution algorithm；

Adder tree adds up to the output result of multiple multipliers；

Each described acoustic convolver inputs K × K matrix form pixel data, is joined according to the pixel data of input and the weight Number passes through convolution algorithm output pixel data by turn.

7. acoustic convolver according to claim 6, which is characterized in that the output state includes:

Two parallel FIFO memories, are stored in the same FIFO after the channel data of the same filter is accumulated In memory；

Data selector, for each accumulated result to be back to the adder tree until the adder exports finally Accumulation result.

8. acoustic convolver according to claim 1, which is characterized in that the acoustic convolver further include:

Pond computing circuit is connected between the output state and the external memory, for the convolution algorithm As a result Chi Huahou is carried out to export to external memory.

9. acoustic convolver according to claim 1, which is characterized in that between each internal part included by the acoustic convolver, And it is connected between the acoustic convolver and the external memory by first in, first out data-interface.

10. a kind of artificial intelligence process device, which is characterized in that the artificial intelligence process device include as claim 1 to Acoustic convolver described in claim 9 any claim.