CN107992942B

CN107992942B - Convolutional neural network chip and convolutional neural network chip operation method

Info

Publication number: CN107992942B
Application number: CN201610946130.3A
Authority: CN
Inventors: 郭一民; 戴瑾
Original assignee: Shanghai Information Technologies Co ltd
Current assignee: Shanghai Information Technologies Co ltd
Priority date: 2016-10-26
Filing date: 2016-10-26
Publication date: 2021-10-01
Anticipated expiration: 2036-10-26
Also published as: CN107992942A

Abstract

The invention discloses a convolutional neural network chip and an operation method thereof. The convolutional neural network chip of the present invention includes: the storage array of the predetermined number is used for storing an input data array of a layer of convolutional neural network, wherein the predetermined number is larger than or equal to the kernel size of the convolutional neural network; and wherein each row of the input data array is present in turn within a respective row of the memory array. Compared with the traditional memory architecture, the invention obtains the improvement of the memory reading speed by several times, thereby obtaining the same overall speed improvement. Therefore, the invention provides a new neural network chip memory architecture capable of improving the memory reading speed.

Description

Convolutional neural network chip and convolutional neural network chip operation method

Technical Field

The invention relates to the field of semiconductor chips and artificial intelligence, in particular to a convolutional neural network chip and an operation method of the convolutional neural network chip.

Background

The human brain is a complex network of numerous neurons connected. Each neuron receives information by connecting to a large number of other neurons through a large number of dendrites, each connection point being called a Synapse (Synapse). After the external stimulus has accumulated to a certain extent, a stimulus signal is generated and transmitted out through the axon. Axons have a large number of terminals, which are connected by synapses to dendrites of a large number of other neurons. It is such a network consisting of simple functional neurons that implement all the intelligent activities of human beings. Human memory and intelligence are generally believed to be stored in the different coupling strengths at each synapse.

The response frequency of neurons does not exceed 100Hz, and the CPU of modern computers is 1000 ten thousand times faster than the human brain, but the ability to handle many complex problems is inferior to the human brain. This has prompted the computer industry to begin to mimic the human brain. The earliest emulation of the human brain was at the software level. Neural network algorithms, emerging from the 60 s of the last century, mimic the function of neurons with a function. The function accepts a plurality of inputs, each input having a different weight, and the process of learning training is to adjust the respective weights. The function is output to many other neurons, forming a network. The algorithm has achieved abundant results and is widely applied.

The networks in neural network algorithms are divided into many layers. In the earliest network, each neuron in the upper layer is connected with each neuron in the lower layer to form a fully connected network. One problem with fully-connected networks is that in image processing applications, the number of pixels in an image is large, and the amount of weight required for each layer is proportional to the square of the pixels, so that the solution occupies too much memory and is computationally infeasible.

In convolutional neural networks, many of the previous layers are no longer fully connected. The neurons of each layer are arrayed as one image. Each neuron of the next layer is in communication with only a small region of this layer. The small region is often a square region with a length of k, called the Kernel Size (Kernel Size) of the convolutional network, as shown in fig. 1.

Convolutional Neural Networks (CNN) are named because the weighted sum of the individual points of this small area resembles a convolution. This set of weights is the same at each point in each cell of the same layer (i.e., translational invariance), thereby substantially reducing the number of weights compared to a fully connected network, enabling high resolution image processing. A convolutional neural network comprises a plurality of such connected layers, as well as other kinds of layers.

With the popularization of deep learning applications, people begin to develop special neural network chips. The addition and multiplication of neuron calculation is realized by special circuits, and the method is much more efficient than that of a CPU or a GPU.

The human brain is characterized by large-scale parallel computation, a large number of neurons can work simultaneously, and each neuron is connected with thousands of neurons. For modern integrated circuit technology, it is easy to integrate a large number of neurons on one chip, but it is very difficult to provide internal communication bandwidth like the human brain. For example, if input data for a layer of neurons is stored in a RAM, it takes at least k clock cycles to read the data out, since the memories of different rows cannot be read or written simultaneously. Therefore, the speed of reading data, i.e. memory bandwidth, is a bottleneck in the computation.

Disclosure of Invention

In view of the above-mentioned defects in the prior art, the present invention provides a new neural network on-chip memory architecture capable of increasing the memory readout speed.

In order to achieve the above object, the present invention provides a convolutional neural network chip, including: the storage array of the predetermined number is used for storing an input data array of a layer of convolutional neural network, wherein the predetermined number is larger than or equal to the kernel size of the convolutional neural network; and wherein each row of the input data array is present in turn within a respective row of the memory array.

Preferably, the memory array employs magnetoresistive random access memory.

In order to achieve the above object, the present invention provides a convolutional neural network chip operating method, including:

a storage step: storing an input data array of a layer of convolutional neural network by using a convolutional neural network chip comprising a predetermined number of storage arrays, wherein the predetermined number is greater than or equal to the kernel size of the convolutional neural network; and wherein each row of the input data array is sequentially present within a respective row of the memory array;

a calculation step: neurons are used to read data simultaneously on multiple arrays of the same number as the kernel size.

Preferably, in the calculating step, data are read from a plurality of neurons located in the same row on the same row of the plurality of arrays at the same time, and the calculation is performed in parallel.

Preferably, the memory array employs magnetoresistive random access memory.

Compared with the traditional memory architecture, the invention obtains the improvement of k times of the memory reading speed, thereby obtaining the same overall speed improvement. Therefore, the invention provides a new neural network chip memory architecture capable of improving the memory reading speed.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

Fig. 1 is an architecture of a convolutional neural network.

Fig. 2 is a schematic structural diagram of a convolutional neural network chip according to a preferred embodiment of the present invention.

Fig. 3 is a flow chart of a convolutional neural network chip operation method according to a preferred embodiment of the present invention.

It is to be noted, however, that the appended drawings illustrate rather than limit the invention. It is noted that the drawings representing structures may not be drawn to scale. Also, in the drawings, the same or similar elements are denoted by the same or similar reference numerals.

Detailed Description

Specifically, as shown in fig. 2, the convolutional neural network chip according to the preferred embodiment of the present invention includes: the device comprises a predetermined number of storage arrays, a convolutional neural network and a data processing unit, wherein the predetermined number of storage arrays is used for storing an input data array of one layer of the convolutional neural network, and is greater than or equal to the kernel size k of the convolutional neural network; and wherein each row of the input data array is present in turn within a respective row of the memory array.

Thus, in performing the computation, the neuron can read data on the k arrays simultaneously, reading all the data required for the computation in one cycle. Furthermore, data may be read simultaneously from multiple neurons in the same row on the same row of the k arrays, with calculations performed in parallel.

Accordingly, FIG. 3 is a flow chart of a method of operation of a convolutional neural network chip, in accordance with a preferred embodiment of the present invention.

Specifically, as shown in fig. 3, the convolutional neural network chip operation method according to the preferred embodiment of the present invention includes:

storage step S1: storing an input data array of a layer of convolutional neural network by using a convolutional neural network chip comprising a predetermined number of storage arrays, wherein the predetermined number is greater than or equal to a kernel size k of the convolutional neural network; and wherein each row of the input data array is sequentially present within a respective row of the memory array;

calculation step S2: neurons are used to read data on k arrays simultaneously, thus reading all the data needed for computation in one cycle.

In the calculation step S2, data may be read from a plurality of neurons located in the same row at the same time in the same row of the k arrays, and the calculation may be performed in parallel.

Moreover, the present invention is applicable to any Memory, but Magnetoresistive Random Access Memory (MRAM) is currently the best Memory technology that can be integrated into logic circuits, and thus the present invention is most suitably applied to MRAM. In particular, the memory array of the present invention employs magnetoresistive random access memory.

While the foregoing description shows and describes the preferred embodiments of the present invention, it is to be understood that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as described herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A convolutional neural network chip, comprising: the storage array of the predetermined number is used for storing an input data array of a layer of convolutional neural network, wherein the predetermined number is larger than or equal to the kernel size of the convolutional neural network; and wherein each row of the input data array is present in turn within a respective row of the memory array.

2. The convolutional neural network chip of claim 1, wherein the memory array employs magnetoresistive random access memory.

3. A convolutional neural network chip operation method, comprising:

4. The convolutional neural network chip operating method of claim 3, wherein in the calculating step, data is read simultaneously on the same row of the plurality of arrays by a plurality of neurons located on the same row, and the calculation is performed in parallel.

5. The convolutional neural network chip operating method of claim 3 or 4, wherein the memory array employs magnetoresistive random access memory.