CN106485317A

CN106485317A - A kind of neutral net accelerator and the implementation method of neural network model

Info

Publication number: CN106485317A
Application number: CN201610851931.1A
Authority: CN
Inventors: 易敬军; 陈邦明; 王本艳
Original assignee: Shanghai Xinchu Integrated Circuit Co Ltd
Current assignee: Shanghai Xinchu Integrated Circuit Co Ltd
Priority date: 2016-09-26
Filing date: 2016-09-26
Publication date: 2017-03-08

Abstract

The present invention relates to a kind of data processing method of neutral net, the implementation method of more particularly to a kind of neutral net accelerator and neural network model.A kind of implementation method of neutral net accelerator, including nonvolatile memory, nonvolatile memory includes the data storage array prepared in rear road manufacturing process flow, in the front road manufacturing process flow for preparing data storage array, neutral net accelerator circuit is prepared on the silicon substrate below data storage array.A kind of implementation method of neural network model, neural network model include that input signal, connection weight signal, biasing, activation primitive, operation function and output signal, activation primitive and operation function are realized by neutral net accelerator circuit；During input signal, connection weight, biasing and output signal are preserved to data storage array.Directly neutral net accelerator circuit is realized in the lower section of data storage array so that data storage bandwidth is unrestricted, and data are not still lost after power down.

Description

A kind of neutral net accelerator and the implementation method of neural network model

Technical field

The present invention relates to a kind of data processing method of neutral net, more particularly to a kind of realization of neutral net accelerator Method.

The present invention relates to a kind of implementation method of model, more particularly to a kind of implementation method of neural network model.

Background technology

Artificial neural network (Artificial Neural Network, ANN), artificial intelligence since being the eighties in 20th century The study hotspot that energy field is risen.It carries out abstract from information processing angle to human brain neuroid, sets up certain simple mould Type, and by the different network of different connected mode compositions.Also neutral net or class nerve are often simply called in engineering with academia Network.Neutral net is a kind of operational model, by being interconnected to constitute between substantial amounts of node (or claiming neuron), as Fig. 1 institute Shown is the schematic diagram of a neuron, a kind of specific output function of each node on behalf, referred to as excitation function；Each two section Connection between point all represents one for by the weighted value of the connection signal, referred to as weight, and this is equivalent to ANN The memory of network.The output of network is then different according to the difference of the connected mode of network, weighted value and excitation function.And network itself It is generally all that certain algorithm of nature or function are approached, it is also possible to a kind of expression of logic strategy.

Isomery accelerator becomes the main flow side of current architectural study by its outstanding power dissipation ratio of performance in recent years To.While with the rise of deep learning, the research of deep learning neutral net has also come back to the tide in machine learning field Head.Deep learning is a new field in machine learning research, and its motivation is to set up, simulate human brain is analyzed learning Neutral net, it imitates the mechanism of human brain explaining data, such as image, sound and text.Deep learning is a complexity Machine learning algorithm, in terms of voice and image recognition obtain effect, considerably beyond prior related art.Therefore, how Efficiently realize that Processing with Neural Network system receives academia and industrial quarters is widely paid close attention on accelerator.At present, depth Learning algorithm is may be implemented on the CPU in high in the clouds, although load capacity is strong, but all not dominant in performance and power consumption, and becomes This is sufficiently expensive, and operating frequency is up to 2～3GHz, and by contrast, human brain operating frequency is only several hertz, and parallel and point The obviously different greatly and discrete internal memory of the human brain structure of cloth and continuous and centralized von Neumann framework and processor it Between occur in that data bandwidth bottleneck, therefore this implementation non-effective and economy.The method of another kind of deep learning is just It is to accelerate the outstanding Floating-point Computation performance of GPU using GPU (graphic process unit), deep learning needs very high inherence parallel Degree, substantial amounts of Floating-point Computation ability and matrix budget, and GPU can provide these abilities, and under identical precision, gather around Faster processing speed, less server is had to put into and lower power consumption.But GPU high cost, energy consumption are big, and main performance Bottleneck is restriction of PCIE (external components interconnection standard) bandwidth for data transfer.

The speed of the neuron of human brain although conducted signal is very slow, but but has huge quantity, and each Neuron is all connected with other neurons by thousands of cynapses, forms super huge neuronal circuit, in a distributed manner With the mode conducted signal of concurrent type frog, so as to compensate for the deficiency of MN processing speed.The leading SyNAPSE of IBM is huge Neural network chip, on the power of 70 milliwatts provide 1,000,000 " neuron " kernels, 2.56 hundred million " cynapse " kernels and 4096 " nerve synapse " kernels, even allow for neutral net and machine learning load capacity has surmounted von Neumann framework. It can be seen that, the exclusive chip of artificial neural network customized using depth either in integrated level, power consumption and machine learning load all Possesses clear superiority.But this chip use volatibility SRAM (Static Random Access Memory, quiet State random access memory) as data storage, although SRAM read or write speed is exceedingly fast, but cerebral nerve network does not need pole High processing speed, and higher storage density and concurrency be only top-priority.And its chip occupying area and work( Consumption is all very big, and data all will be lost after power down, irrecoverable, that is, has to SRAM deutocerebrum before power down During the data of study are preserved to extra nonvolatile storage, power consumption is taken again.Traditional simulates reality with software realization mode Existing neuron and cynapse, not dominant in power consumption and performance (needing to open the CPU or GPU of high energy consumption), and in mould All poor in terms of scalability, flexibility when the complicated cerebral cortex behavior model of plan and neuronal structure.

Content of the invention

But there is very flexible for neutral net accelerator being realized by software at present, realized by hardware exist Data storage bandwidth is restricted, and the problem that data are easily lost after power down, the present invention provide one kind and realize neutral net accelerator And the method for neural network model.

The present invention solves the technical scheme that adopted of technical problem：

A kind of implementation method of neutral net accelerator, including nonvolatile memory, the nonvolatile memory bag The data storage array prepared in Kuo Hou road manufacturing process flow,

Silicon in the front road manufacturing process flow for preparing the data storage array, below the data storage array Neutral net accelerator circuit is prepared on substrate.

Preferably, it is additionally included in the front road manufacturing process flow for preparing the data storage array, deposits in the data Peripheral logical circuit is prepared on silicon substrate below storage array.

Preferably, the neutral net accelerator circuit is used for executing mathematical operation, array operation, matrix operation, state Operation, checkpointing or sequence and simultaneously operating.

Preferably, the neutral net accelerator circuit also includes transistor-transistor logic circuit, for realizing feedback-type nerve Network.

Preferably, the feedback neural network includes Elman network and Hopfield network.

Preferably, the nonvolatile memory is 3D nand memory or 3D phase transition storage.

Preferably, the data storage cell is the multi-layer data memory cell of vertical stacking.

A kind of implementation method of neural network model, including claim described above, the neural network model bag Include from other neurons X₁～X_nIt is transmitted through input signal, the connection weight signal W from neuron j to neuron i for coming_ij(j=1～ N), biasing, activation primitive, operation function and output signal,

The activation primitive and operation function are realized by the neutral net accelerator circuit；

The input signal, connection weight, biasing and output signal are preserved deposits to the data of the nonvolatile memory In storage array.

Preferably, the input signal, connection weight signal, biasing, activation primitive, operation function are carried out formula operation or Logical operation, the formula operation or logical operation are realized by the neutral net accelerator circuit.

Beneficial effects of the present invention：The present invention stores cerebral neuron deep learning using 3D nonvolatile memory As a result, the demand of ten thousand more than one hundred million neurons and cynapse to memory capacity during deep learning, Er Qiezhi are greatly met into Neutral net accelerator circuit is realized in the lower section for being connected on 3D data storage array so that data storage bandwidth is unrestricted, and After power down, data are not still lost, and are a kind of low-power consumption, low cost, high performance implementation method.The neutral net can also be passed through Accelerator circuit data storage array realizes neural network model, significantly reduces the consumption to internal memory, improves performance.

Description of the drawings

Fig. 1 is a kind of neuron schematic diagram in prior art；

Fig. 2 a is the schematic perspective view of the non-volatile memory architecture of the present invention；

Fig. 2 b is the schematic cross-section of the non-volatile memory architecture of the present invention；

Fig. 3 is the neural network model of M-P model；

Fig. 4 is a kind of implementation method schematic diagram of the neural network model for realizing Fig. 3 of the present invention；

Fig. 5 illustrates for a kind of implementation method of neural network model that input signal is differentiated of the present invention Figure；

Fig. 6 illustrates for a kind of implementation method of the neural network model for being integrated computing to input signal of the present invention Figure；

Fig. 7 is Hopfield neutral net schematic diagram.

Specific embodiment

The invention will be further described with specific embodiment below in conjunction with the accompanying drawings, but not as limiting to the invention.

The present invention proposes a kind of a kind of implementation method of the neutral net accelerator based on nonvolatile memory.Non-volatile Property memory adopt non-flat design, in vertical direction adopt rear road manufacturing process flow (BEOL) stacked multilayer data storage Unit receives more high storage capacity to obtain higher storage density in less space content, and then brings and very big become this section About, energy consumption reduces, such as 3D nand memory and 3D phase transition storage.Below these 3D data storage arrays 1 be using front The peripheral logical circuit of memory prepared by road manufacturing process flow (FEOL).With the continuous improvement of memory chip capacity, 1 capacity of data storage array is increasing, and corresponding peripheral logical circuit area increases less, thus can vacate a lot of volumes Outer area.A kind of implementation method of neutral net accelerator of the present invention, as shown in figures 2 a and 2b, is non-volatile based on 3D Memory process, prepares neutral net accelerator circuit 3 in the lower section of data storage array 1 in front road fabrication processing, The other white space of peripheral logical circuit namely on the silicon substrate 2 of memory prepares neutral net accelerator circuit 3, It is exactly neutral net accelerator.Neutral net accelerator is exactly to realize all of neuron and cynapse with hardware circuit, and deposits Reservoir is as storage input and output data.

Described neutral net accelerator circuit 3 can realize computing and the operation of multiple formula, specifically have：Element-Level Mathematical operation：Such as addition, subtraction, multiplication, division, exponent arithmetic, logarithm operation, integral operation, differentiate, be more than, little In and be equal to computing etc.；The merging of array operation, such as array, array slice, array decomposition, the deformation of array sort array and number Group is shuffled；Matrix operation, such as matrix multiplication, matrix inversion and matrix determinant etc.；State is operated, such as assignment；Nerve Network struction module；Such as checkpointing, preserves and recovers；Sequence and simultaneously operating, such as join the team, go out team etc.；Controlling stream Operation etc..

The present invention also proposes a kind of implementation method of neural network model based on above-mentioned neutral net accelerator.A kind of title Neural network model for M-P model (McCulloch-Pitts Model) is as shown in figure 3, transported using matrix multiple and addition Calculate and realize, X₁～X_nIt is the input signal transmitted from other neurons, W_ij(j=1～n) represents from neuron j to neuron i's Connection weight, b represent biasing.The output of so neuron i is expressed as with the relation of input：

y_i=f (net_i)

In figure y_iRepresent the output of neuron i, function f is referred to as activation primitive or transfer function, net is referred to as net activation.If The net activation net of neuron is for just, claiming the neuron to be active or excitatory state；If activation net is negative only, claim god Holddown is in through unit.Above-mentioned neural network model becomes a processing unit, it is clear that can be realized by software Said process.But the present invention takes hardware circuit to realize, as shown in figure 4, wherein matrix multiple computing, add operation and swash Function living is all using the silicon space below 3D data of nonvolatile storage storage array 1 and using front road manufacture craft stream The hardware circuit of the neutral net accelerator that journey is made is realizing, and the data input signal X of neuron i₁～X_n, neuron it Between connection weight W_ijThe output signal value y of (j=1～n), offset signal value b and neuron_i3D can be all stored in non-easily In the data storage array 1 of the property lost memory, 1 lower section of 3D data of nonvolatile storage storage array is not only taken full advantage of Vacant silicon area, and make neuroid be obtained in that high storage density, greatly save data transmission power consumption.

Further, many optimized algorithms, including common machine learning training algorithm, such as stochastic differential declines to be calculated Method, needs to carry out differential or integral operation to a series of input signals, as shown in Figure 5 and Figure 6, if this kind of computing software Realize then consuming very much internal memory.Differential and integral operation equally can be using a kind of realization sides of neural network model of the present invention Method is realizing：The neural network model differentiated by input signal, is to be realized based on neutral net accelerator, wherein The differentiating of matrix multiple, differentiating for addition and differentiating all using neutral net accelerator for activation primitive Hardware circuit realize, and between the differential value dy/dx of the data input signal of neuron, neuron connection weight differential value The output signal value t of dy/dW, the differential value dy/db of offset signal and neuron can all be stored in 3D non-volatile memories In the data storage array 1 of device.In the same manner, a series of input signals are carried out with the neural network model of higher differentiation and high step integration Also can be realized using the method for the present invention.

Feedback neural network is a kind of from the neutral net that is input into feedback link is exported to, and its structure is than general Neutral net is more complex.Typical feedback neural network has：Elman network and Hopfield network.For example shown in Fig. 7 Be a Hopfield network, in figure, the 0th layer of defeated people for being merely possible to network, it is not actual neuron, so no Computing function；And ground floor is actual neuron, so execute to the cumulative of input information and weight coefficient product and, and by non-thread Property function f process after produce output information.F is a simple threshold values function, if the output information of neuron is more than biasing B, then, the output of neuron is 1 with regard to value；Less than biasing b, then the output of neuron is biasing b with regard to value.The obvious feedback Network also can be realized with the method for realizing neural network model of the present invention, stored using 3D data of nonvolatile storage Above-mentioned neural network model realized by the transistor-transistor logic circuit of 1 lower section of array.

The present invention is used for storing input data and the storage output of neuron and cynapse using 3D nonvolatile memory As a result, storage density, reduces cost is substantially increased, and data is not lost after ensure that power down.By contrast, traditional With software realization mode come simulated implementation neuron and cynapse, not dominant in power consumption and performance, scalability, flexibility Aspect is all poor.Traditional neural network hardware realization mode uses the SRAM memory of volatibility, chip occupying area All very big with power consumption, and loss of data after power down.The present invention stores cerebral neuron depth using nonvolatile memory The result of study, greatly meets into the demand of ten thousand more than one hundred million neurons and cynapse to memory capacity during deep learning, And directly neutral net accelerator is prepared in the lower section of data storage array 1 so that data storage bandwidth is unrestricted, and After power down, data are not still lost.Therefore the neutral net accelerator that the present invention is realized based on nonvolatile memory process is one Plant low-power consumption, low cost, high performance implementation method.

Preferred embodiments of the present invention are the foregoing is only, not thereby limits embodiments of the present invention and protection model Enclose, to those skilled in the art, should can appreciate that what all utilization description of the invention and diagramatic content were made Scheme obtained by equivalent and obvious change, should all be included in protection scope of the present invention.

Claims

1. a kind of implementation method of neutral net accelerator, it is characterised in that including nonvolatile memory, described non-volatile Memory includes the data storage array prepared in rear road manufacturing process flow,

Silicon substrate in the front road manufacturing process flow for preparing the data storage array, below the data storage array Upper preparation neutral net accelerator circuit.

2. the implementation method of neutral net accelerator according to claim 1, it is characterised in that be additionally included in preparation described In the front road manufacturing process flow of data storage array, on the silicon substrate below the data storage array, prepare peripheral logic Circuit.

3. the implementation method of neutral net accelerator according to claim 1, it is characterised in that the neutral net accelerates Device circuit is used for executing mathematical operation, array operation, matrix operation, state operation, checkpointing or sequence and simultaneously operating.

4. the implementation method of neutral net accelerator according to claim 1, it is characterised in that the neutral net accelerates Device circuit also includes transistor-transistor logic circuit, for realizing feedback neural network.

5. the implementation method of neutral net accelerator according to claim 4, it is characterised in that the feedback-type nerve net Network includes Elman network and Hopfield network.

6. the implementation method of neutral net accelerator according to claim 1, it is characterised in that the non-volatile memories Device is 3D nand memory or 3D phase transition storage.

7. the implementation method of neutral net accelerator according to claim 1, it is characterised in that the data storage cell Multi-layer data memory cell for vertical stacking.

8. a kind of implementation method of neural network model, including the claim described in claim 1-7 any one, its feature It is, the neural network model is included from other neurons X₁～X_nBe transmitted through come input signal, from neuron j to neuron i Connection weight signal W_ij(j=1～n), biasing, activation primitive, operation function and output signal,

The input signal, connection weight, biasing and output signal are preserved to the data storage battle array of the nonvolatile memory In row.

9. the implementation method of neural network model according to claim 8, it is characterised in that to the input signal, company Connecing power signal, biasing, activation primitive, operation function carries out formula operation or logical operation, the formula operation or logical operation Realized by the neutral net accelerator circuit.