CN110178146B

CN110178146B - Deconvolutor and artificial intelligence processing device applied by deconvolutor

Info

Publication number: CN110178146B
Application number: CN201880002766.XA
Authority: CN
Inventors: 肖梦秋
Original assignee: Shenzhen Corerain Technologies Co Ltd
Current assignee: Shenzhen Corerain Technologies Co Ltd
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2023-05-12
Anticipated expiration: 2038-01-15
Also published as: CN110178146A; WO2019136747A1

Abstract

The deconvolutor (100) and an artificial intelligence processing device applied by the deconvolutor are electrically connected to an external memory (200), and the external memory (200) stores data to be processed and weight parameters; the deconvolutor (100) includes: a parameter register (110), an input register, a deconvolution operation circuit (140), and an output register (150); a parameter buffer (110) for receiving and outputting the weight parameter; the input buffer includes: a plurality of connected line buffers for receiving and outputting data to be processed; wherein, each line buffer gathers and forms a line data output every time one bit data is output; the deconvolution operation circuit (140) is used for receiving data to be processed from the input buffer, receiving weight parameters from the parameter buffer (110), performing deconvolution operation according to the weight parameters and outputting deconvolution operation results; the output buffer (150) is used for receiving the deconvolution operation result and outputting the deconvolution operation result to the external memory (200). The method can effectively solve the problems of slow processing speed and high requirements on the performance of the processor caused by software operation in the prior art.

Description

Deconvolutor and artificial intelligence processing device applied by deconvolutor

Technical Field

The invention relates to the technical field of processors, in particular to the technical field of artificial intelligence processors, and specifically relates to a deconvolutor and an artificial intelligence processing device applied by the deconvolutor.

Background

The deconvolution neural network (Convolutional Neural Network, CNN) is a feed-forward neural network whose artificial neurons can respond to surrounding cells in a part of the coverage area with excellent performance for large image processing. The deconvolution neural network includes a deconvolution layer (convolutional layer)) and a pooling layer (pooling layer).

CNN has become one of the research hotspots in many scientific fields, especially in the pattern classification field, and is more widely used because the network avoids complex pre-processing of images and can directly input original images.

In general, the basic structure of CNNs includes two layers, one of which is a feature extraction layer, with the input of each neuron connected to a local receptive field of the previous layer and extracting the local features. Once the local feature is extracted, the positional relationship between the other features is also determined; and the second is a feature mapping layer, each calculation layer of the network consists of a plurality of feature maps, each feature map is a plane, and the weights of all neurons on the plane are equal. The feature mapping structure adopts a sigmoid function with small influence function kernel as an activation function of the deconvolution network, so that the feature mapping has displacement invariance. In addition, the number of network free parameters is reduced because the neurons on one mapping surface share weights. Each deconvolution layer in the deconvolution neural network is followed by a computational layer for local averaging and secondary extraction, which unique secondary feature extraction structure reduces feature resolution.

CNNs are used primarily to identify displacement, scaling, and other forms of two-dimensional graphics that do not distort. Since the feature detection layer of the CNN learns through the training data, the feature extraction of the display is avoided when the CNN is used, and the CNN is implicitly learned from the training data; furthermore, because the weights of the neurons on the same feature mapping plane are the same, the network can learn in parallel, which is also a great advantage of deconvolution networks with respect to networks in which the neurons are connected to each other. The deconvolution neural network has unique superiority in terms of voice recognition and image processing by using a special structure of local weight sharing, the layout of the deconvolution neural network is closer to that of an actual biological neural network, the weight sharing reduces the complexity of the network, and particularly the characteristic that the image of the multidimensional input vector can be directly input into the network avoids the complexity of data reconstruction in the characteristics of feature extraction and classification.

At present, the deconvolution neural network is implemented by operating software running in a processor or a plurality of distributed processes, and as the complexity of the deconvolution neural network increases, the processing speed is relatively slower, and the performance requirements on the processor are higher and higher.

Disclosure of Invention

In view of the above drawbacks of the prior art, an object of the present invention is to provide a deconvolutor and an artificial intelligence processing device applied by the deconvolution device, which are used for solving the problems of slow processing speed and high requirement on processor performance caused by realization of software operation in the deconvolution neural network in the prior art.

To achieve the above and other related objects, the present invention provides a deconvolutor electrically connected to an external memory, wherein the external memory stores data to be processed and weight parameters; the deconvolutor includes: the device comprises a parameter buffer, an input buffer, a deconvolution operation circuit and an output buffer; the parameter buffer is used for receiving and outputting the weight parameters; the input buffer includes: a plurality of connected line buffers for receiving and outputting the data to be processed; wherein, each line buffer gathers and forms a line data output every time one bit data is output; the deconvolution operation circuit is used for receiving the data to be processed from the input buffer, receiving weight parameters from the parameter buffer, performing deconvolution operation according to the weight parameters, and outputting deconvolution operation results; the output buffer is used for receiving the deconvolution operation result and outputting the deconvolution operation result to the external memory.

In an embodiment of the present invention, the input buffer includes: and the first line buffer is used for receiving pixel data of the feature patterns to be processed bit by bit, outputting line pixel data at the same time after the pixel data pass through the filter, and storing the feature patterns of the input deconvolution layers.

In an embodiment of the present invention, the first line buffer sequentially outputs the line pixel data of each deconvolution layer, and sequentially outputs the line pixel data of each channel data when outputting each deconvolution layer line pixel data.

In an embodiment of the present invention, the input buffer further includes: and the at least one second line buffer is used for acquiring weight parameters of each filter from the external memory and sequentially inputting the weight parameters into the parameter buffer.

In an embodiment of the present invention, the deconvolution operation circuit includes: a plurality of deconvolution kernels running in parallel, each of said deconvolution kernels comprising a multiplier for performing deconvolution operations; an adder tree for accumulating the output results of the multipliers; and each deconvolutor inputs pixel data in a K multiplied by K matrix form, and outputs the pixel data bit by bit through deconvolution operation according to the input pixel data and the weight parameters.

In an embodiment of the present invention, the output buffer includes: a plurality of parallel FIFO memories, wherein channel data passing through the same filter are accumulated and stored in the same FIFO memory; and the data selector is used for returning the accumulated result to the adder tree until the adder outputs a final accumulated result.

In an embodiment of the invention, the deconvolutor further includes: and the pooling operation circuit is connected between the output buffer and the external memory and is used for pooling the deconvolution operation result and outputting the pooled deconvolution operation result to the external memory.

In an embodiment of the present invention, the deconvolutor includes internal components and the deconvolutor and the external memory are connected through a first-in first-out data interface.

The invention also provides an artificial intelligence processing device comprising a deconvolutor as described above.

As described above, the deconvolutor and the artificial intelligence processing device applied by the deconvolutor have the following beneficial effects:

the deconvolutor consists of hardware such as a parameter buffer, an input buffer, a deconvolution operation circuit, an output buffer, a pooling operation circuit, a first-in first-out data interface and the like, can process deconvolution neural network algorithm with high complexity at high speed, and can effectively solve the problems of slow processing speed and high requirement on the performance of a processor caused by software operation in the prior art.

Drawings

Fig. 1 is a schematic diagram of the whole principle of a deconvolutor in the prior art.

Fig. 2 shows an input/output schematic diagram of a deconvolutor according to the present invention.

Description of element reference numerals

100. Deconvolution device

110. Parameter buffer

120. First line buffer

130. Second line buffer

140. Deconvolution operation circuit

150. Output buffer

160. Pond operation circuit

200. External memory

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

It should be noted that, as shown in fig. 1 to 2, the illustrations provided in the following embodiments merely illustrate the basic concepts of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings, rather than the number, shape and size of the components in actual implementation, and the form, number and proportion of each component in actual implementation may be arbitrarily changed, and the layout of the components may be more complex.

The purpose of the embodiment is to provide a deconvolution device and an artificial intelligence processing device applied by the deconvolution device, which are used for solving the problems that in the prior art, the deconvolution neural network is realized through software operation, so that the processing speed is slow, and the requirement on the performance of a processor is high. The principle and implementation of a deconvolutor and an artificial intelligence processing apparatus to which the deconvolutor of the present embodiment is applied will be described in detail below, so that those skilled in the art can understand the deconvolutor and the artificial intelligence processing apparatus to which the deconvolutor of the present embodiment is applied without creative effort.

Specifically, as shown in fig. 1, the present embodiment provides a deconvolutor 100, where the deconvolutor 100 is electrically connected to an external memory 200, and the external memory 200 stores data to be processed and weight parameters; the deconvolutor 100 includes: a parameter register 110, an input register, a deconvolution operation circuit 140 and an output register 150.

The first data to be processed comprises a plurality of channel data; the first weight parameter comprises a plurality of layers of sub-parameters, and each layer of sub-parameter corresponds to each channel data one by one; the deconvolution operation circuit 140 has a plurality of deconvolution operation circuits for parallel computing deconvolution operation results of each channel data in a one-to-one correspondence.

In this embodiment, the parameter buffer 110 (Con_reg shown in FIG. 2) is configured to receive and output the Weight parameter (Weight shown in FIG. 2). The parameter buffer 110 includes a FIFO memory in which the weight parameters are stored. The parameters in the input buffer, the deconvolution operation circuit 140 and the output buffer 150 are configured and stored in the parameter buffer 110.

In this embodiment, the input buffer includes: a plurality of connected line buffers for receiving and outputting the data to be processed; wherein, each line buffer is assembled to form a line data output every time one bit of data is output.

The input buffer includes a first line buffer 120 (RAM shown in fig. 2, second line buffer 130 (coef_reg shown in fig. 2). First line buffer 120, second line buffer 130) for processing input of 1*1 pixel data to output K pixel data. Where K is the size of the deconvolution kernel. The input buffer is described in detail below.

Specifically, in this embodiment, the first line buffer 120 receives pixel data of the feature map to be processed bit by bit, and outputs line pixel data at the same time after passing through the filter, and stores the feature maps of each deconvolution layer; wherein the number of data of each row of pixels is the number of parallel filters.

In this embodiment, the first line buffer 120 includes a RAM, and the feature map input pixel data of each deconvolution layer is buffered in the RAM to improve the localization storage of the pixel data.

In this embodiment, the first line buffer 120 sequentially outputs the line pixel data of each deconvolution layer, and sequentially outputs the line pixel data of each channel data when outputting each deconvolution layer line pixel data. That is, the first line buffer 120 outputs the pixel data of the first channel at the beginning, when the processing of the pixel data of the first channel is completed, the first line buffer 120 starts to output the pixel data of the second channel, and when the pixel data of all channels of one deconvolution layer are output, the pixel data of the channel of the next deconvolution layer is output. Wherein the first line buffer 120 performs iterative computation output from the first deconvolution layer to the last deconvolution layer using different filters.

In this embodiment, the input buffer further includes: at least one second line buffer 130, as shown in fig. 2, the second line buffer 130 includes a FIFO memory, and the second line buffer 130 (coef_reg shown in fig. 2) is used to obtain the weight parameters of each filter from the external memory 200 and sequentially input the weight parameters to the parameter buffer. Wherein the second line buffer 130 is connected to the external memory 200 through a first-in first-out data interface (multiple SIFs shown in fig. 2). The pixel data output from the second line buffer 130 is in the form of a k×k matrix.

In this embodiment, the deconvolution circuit 140 is configured to receive the data to be processed from the input buffer, receive the weight parameter from the parameter buffer 110, perform deconvolution operation according to the received weight parameter, and output a deconvolution operation result.

Specifically, in the present embodiment, the deconvolution operation circuit 140 includes: a plurality of deconvolution kernels running in parallel, each of said deconvolution kernels comprising a multiplier for performing deconvolution operations; an adder tree for accumulating the output results of the multipliers; each deconvolutor 100 inputs pixel data in the form of a k×k matrix, and outputs the pixel data bit by bit through deconvolution operation according to the input pixel data and the weight parameter.

That is, the deconvolution circuit 140 includes a plurality of multipliers, wherein the matrix used by the multipliers is a transpose of the matrix used by the convolver. The pixel data in the form of a k×k matrix input by the deconvolutor in each clock cycle is multiplied by each column of the transpose matrix of the multiplier to obtain a column of output, which is stored in K FIFO memories of the output buffer 150, respectively.

For example, the image has R, G, B three channels of data, i.e. three two-dimensional matrices, and the first weight parameter, i.e. the depth of the filter, is assumed to be 3, i.e. three layers of sub-weight parameters, i.e. three two-dimensional matrices, each length and width is set to k×k, K is assumed to be an odd number of 3, and each is respectively deconvoluted with three channels, when a data cube (Pv > K) of pv×k×3 is extracted from the first data to be processed, and Pv is assumed to be 5, the filter and the data cube are required to be completely calculated by the deconvolution circuit 140 three times, and preferably, the deconvolution circuit 140 may be provided with a corresponding number of 3, so that deconvolution operations of channels respectively responsible for the deconvolution may be performed in parallel in one clock cycle.

In this embodiment, the output buffer 150 is configured to receive the deconvolution result and output the deconvolution result to the external memory 200.

Specifically, the output buffer 150 receives the deconvolution result of each channel, and then accumulates the deconvolution results of all channel data, and the results are temporarily stored in the output buffer 150.

Specifically, in this embodiment, as shown in fig. 5, the output buffer 150 includes: a plurality of parallel FIFO memories, wherein channel data passing through the same filter are accumulated and stored in the same FIFO memory; a data selector (MUX) for returning the result of each accumulation to the adder tree until the adder outputs a final accumulation result.

Wherein each FIFO memory outputs pixel data in the form of a matrix of K x W x H, the output of one filter is stored in K FIFO memories, and a data selector (MUX) is further used to reduce the data stream speed to 1*1, and the pixel output is one bit and one bit.

In this embodiment, the deconvolutor 100 further includes: and a pooling circuit 160 connected between the output buffer 150 and the external memory 200, for pooling the deconvolution result and outputting the pooled result to the external memory 200.

The pooling operation circuit 160 provides the largest pool for every two lines of pixel data, and the pooling operation circuit 160 also includes a FIFO memory for storing each line of pixel data.

Specifically, the pooling mode may be Max pooling or Average pooling, which may be implemented by a logic circuit.

In this embodiment, the inner components included in the deconvolutor 100 and the external memory 200 are connected through a first-in first-out data interface.

Specifically, the first-in first-out data interface includes: a first-in first-out memory, a first logic unit and a second logic unit.

Wherein the first-in first-out memory comprises: an upstream writable enable pin, a data input pin and a memory full state identification pin; and, a downstream readable enable pin, a data output pin, and a memory empty status identification pin;

the first logic unit is connected with an uplink object, the writable enabling pin and a full-state identification pin of the memory, and is used for determining whether the first-in first-out memory is full or not according to signals on the full-state identification pin of the memory when a write request of the uplink object is received; if not, sending an enabling signal to a writable enabling pin to enable the first-in first-out memory to be writable; otherwise, the first-in first-out memory is made non-writable.

Specifically, the first logic unit includes: the input end of the first reverser is connected with the full state identification pin of the memory, and the output end of the first reverser is led out of a first identification end for connecting an uplink object; and the first input end of the first AND gate is connected with the first data effective identification end, the second input end of the first AND gate is connected with the uplink data effective end for connecting an uplink object, and the output end of the first AND gate is connected with the writable enabling pin.

The second logic unit is connected with the downlink object, the readable enabling pin and the memory empty state identification pin and is used for determining whether the first-in first-out memory is empty or not according to signals on the memory empty state identification pin when a read request of the downlink object is received; if not, sending an enabling signal to a readable enabling pin to enable the first-in first-out memory to be readable; otherwise, the first-in first-out memory is made unreadable.

Specifically, the second logic unit includes: the input end of the second reverser is connected with the empty state identification pin of the memory, and the output end of the second reverser leads out a downlink data effective end for connecting a downlink object; and the first input end of the second AND gate is connected with the downlink data effective end, and the second input end of the second AND gate is connected with a downlink data effective identification end for connecting a downlink object.

In this embodiment, the deconvolutor 100 operates as follows:

the data to be processed is read from the external memory 200 through the first-in first-out data interface and stored to BRAM of the first line buffer 120 (conv_in_cache shown in fig. 2).

Wherein the data to be processed is a characteristic spectrum and a convolution parameter, and the size of the characteristic spectrum is N _C xW 1 xH 1, the convolution parameter includes the number of filters N _F The convolution kernel size k x k, stride s, and boundary extension (Padding) p.

The second line buffer 130 reads N from the external memory 200 through a first-in-first-out data interface (SIF) _F (N _C * k) weight parameter (one channel) and then storeTo the parameter buffer 110.

Once the parameter buffer 110 is loaded into a weight parameter, it starts to receive the pixel data of the processing feature map, and the deconvolution circuit 140 receives k×k pixel data every clock period through the processing of the first line buffer 120 and the second line buffer 130.

The deconvolution operation circuit 140 deconvolves the input data of each channel (the height H and the width W of the feature pattern input by each channel), and then outputs the result of each channel to the output buffer 150.

The output buffer 150 accumulates the data results of each channel until the nf×w2×h2 feature map is obtained.

Then, the pooling operation circuit 160 may receive the nf×w2×h2 pixel data, perform pooling processing, and output the feature map, or may directly output the feature map from the output buffer 150.

After the pooling circuit 160 or the output buffer 150 outputs the feature pattern processed by one filter, the parameter buffer 110 is reloaded to one weight parameter, and the above pixel processing process is iterated through different filters repeatedly until all the deconvolution layer pixel processing is completed.

The present embodiment also provides an artificial intelligence processing apparatus comprising the deconvolutor 100 as described above. The deconvolutor 100 is described in detail above and will not be described here again.

Wherein, the artificial intelligence processor includes: programmable Logic (PL) and Processing System (PS). The processing system circuit comprises a central processing unit, which can be realized by MCU, soC, FPGA or a DSP, for example, an embedded processor chip of ARM architecture, etc.; the central processing unit is in communication connection with an external memory 200, and the external memory 200 is, for example, a RAM or ROM memory, for example, a third-generation DDR SDRAM, a fourth-generation DDR SDRAM, etc.; the central processor can read and write data to the external memory 200.

In summary, the deconvolutor of the present invention is composed of hardware such as a parameter buffer, an input buffer, a deconvolution operation circuit, an output buffer, a pooling operation circuit, a first-in first-out data interface, etc., and can process deconvolution neural network algorithm with high complexity at high speed, and can effectively solve the problems of slow processing speed and high requirement on processor performance caused by software operation implementation in the prior art. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. The deconvolutor is electrically connected to an external memory, wherein the external memory stores data to be processed and weight parameters; characterized in that the deconvolutor comprises: the device comprises a parameter buffer, an input buffer, a deconvolution operation circuit and an output buffer;

the parameter buffer is used for receiving and outputting the weight parameters;

the input buffer includes: a plurality of connected line buffers for receiving and outputting the data to be processed; wherein, each line buffer gathers and forms a line data output every time one bit data is output;

the deconvolution operation circuit is used for receiving the data to be processed from the input buffer, receiving weight parameters from the parameter buffer, performing deconvolution operation according to the weight parameters, and outputting deconvolution operation results;

the output buffer is used for receiving the deconvolution operation result and outputting the deconvolution operation result to the external memory;

the input buffer includes:

a first line buffer for receiving pixel data of the feature pattern to be processed bit by bit, simultaneously outputting line pixel data after passing through a filter, and storing the feature pattern of each deconvolution layer;

the input buffer further includes:

and the at least one second line buffer is used for acquiring weight parameters of each filter from the external memory and sequentially inputting the weight parameters into the parameter buffer.

2. The deconvolutor of claim 1, wherein the first line buffer outputs line pixel data for each of the deconvolution layers in turn, and outputs line pixel data for each of the channel data in turn as each of the deconvolution layer line pixel data is output.

3. The deconvolutor of claim 1, wherein the deconvolution operation circuit comprises:

a plurality of deconvolution kernels running in parallel, each of said deconvolution kernels comprising a multiplier for performing deconvolution operations;

an adder tree for accumulating the output results of the multipliers;

and each deconvolutor inputs pixel data in a K multiplied by K matrix form, and outputs the pixel data bit by bit through deconvolution operation according to the input pixel data and the weight parameters.

4. A deconvolutor in accordance with claim 3, wherein the output buffer comprises:

at least two parallel FIFO memories, wherein channel data passing through the same filter are accumulated and then stored into the same FIFO memory;

and the data selector is used for returning the accumulated result to the adder tree until the adder outputs a final accumulated result.

5. The deconvolutor of claim 1, further comprising:

and the pooling operation circuit is connected between the output buffer and the external memory and is used for pooling the deconvolution operation result and outputting the pooled deconvolution operation result to the external memory.

6. The deconvolutor of claim 1, wherein the deconvolutor includes internal components coupled by a first-in-first-out data interface between the deconvolutor and the external memory.

7. An artificial intelligence processing apparatus comprising a deconvolutor as claimed in any one of claims 1 to 6.